zip-zig

Unnamed repository; edit this file 'description' to name the repository.
Log | Files | Refs

spec.txt (170787B)


      1 File:    APPNOTE.TXT - .ZIP File Format Specification
      2 Version: 6.3.10 
      3 Status: FINAL - replaces version 6.3.9
      4 Revised: Nov 01, 2022
      5 Copyright (c) 1989 - 2014, 2018, 2019, 2020, 2022 PKWARE Inc., All Rights Reserved.
      6 
      7 1.0 Introduction
      8 ---------------
      9 
     10 1.1 Purpose
     11 -----------
     12 
     13    1.1.1 This specification is intended to define a cross-platform,
     14    interoperable file storage and transfer format.  Since its 
     15    first publication in 1989, PKWARE, Inc. ("PKWARE") has remained 
     16    committed to ensuring the interoperability of the .ZIP file 
     17    format through periodic publication and maintenance of this 
     18    specification.  We trust that all .ZIP compatible vendors and 
     19    application developers that use and benefit from this format 
     20    will share and support this commitment to interoperability.
     21 
     22 1.2 Scope
     23 ---------
     24 
     25    1.2.1 ZIP is one of the most widely used compressed file formats. It is 
     26    universally used to aggregate, compress, and encrypt files into a single
     27    interoperable container. No specific use or application need is 
     28    defined by this format and no specific implementation guidance is 
     29    provided. This document provides details on the storage format for 
     30    creating ZIP files.  Information is provided on the records and 
     31    fields that describe what a ZIP file is. 
     32 
     33 1.3 Trademarks
     34 --------------
     35 
     36    1.3.1 PKWARE, PKZIP, Smartcrypt, SecureZIP, and PKSFX are registered 
     37    trademarks of PKWARE, Inc. in the United States and elsewhere.  
     38    PKPatchMaker, Deflate64, and ZIP64 are trademarks of PKWARE, Inc.  
     39    Other marks referenced within this document appear for identification
     40    purposes only and are the property of their respective owners.
     41    
     42 
     43 1.4 Permitted Use
     44 ----------------- 
     45 
     46    1.4.1 This document, "APPNOTE.TXT -  .ZIP File Format Specification" is the
     47    exclusive property of PKWARE.  Use of the information contained in this 
     48    document is permitted solely for the purpose of creating products, 
     49    programs and processes that read and write files in the ZIP format
     50    subject to the terms and conditions herein.
     51 
     52    1.4.2 Use of the content of this document within other publications is 
     53    permitted only through reference to this document.  Any reproduction
     54    or distribution of this document in whole or in part without prior
     55    written permission from PKWARE is strictly prohibited.
     56 
     57    1.4.3 Certain technological components provided in this document are the 
     58    patented proprietary technology of PKWARE and as such require a 
     59    separate, executed license agreement from PKWARE.  Applicable 
     60    components are marked with the following, or similar, statement: 
     61    'Refer to the section in this document entitled  "Incorporating 
     62    PKWARE Proprietary Technology into Your Product" for more information'.
     63 
     64 1.5 Contacting PKWARE
     65 ---------------------
     66 
     67    1.5.1 If you have questions on this format, its use, or licensing, or if you 
     68    wish to report defects, request changes or additions, please contact:
     69 
     70      PKWARE, Inc.
     71      201 E. Pittsburgh Avenue, Suite 400
     72      Milwaukee, WI 53204
     73      +1-414-289-9788
     74      +1-414-289-9789 FAX
     75      zipformat@pkware.com
     76 
     77    1.5.2 Information about this format and a reference copy of this document
     78    is publicly available at:
     79 
     80      http://www.pkware.com/appnote
     81 
     82 1.6 Disclaimer
     83 --------------
     84 
     85    1.6.1 Although PKWARE will attempt to supply current and accurate
     86    information relating to its file formats, algorithms, and the
     87    subject programs, the possibility of error or omission cannot 
     88    be eliminated. PKWARE therefore expressly disclaims any warranty 
     89    that the information contained in the associated materials relating 
     90    to the subject programs and/or the format of the files created or
     91    accessed by the subject programs and/or the algorithms used by
     92    the subject programs, or any other matter, is current, correct or
     93    accurate as delivered.  Any risk of damage due to any possible
     94    inaccurate information is assumed by the user of the information.
     95    Furthermore, the information relating to the subject programs
     96    and/or the file formats created or accessed by the subject
     97    programs and/or the algorithms used by the subject programs is
     98    subject to change without notice.
     99 
    100 2.0 Revisions
    101 --------------
    102 
    103 2.1 Document Status
    104 --------------------
    105 
    106    2.1.1 If the STATUS of this file is marked as DRAFT, the content 
    107    defines proposed revisions to this specification which may consist 
    108    of changes to the ZIP format itself, or that may consist of other 
    109    content changes to this document.  Versions of this document and 
    110    the format in DRAFT form may be subject to modification prior to 
    111    publication STATUS of FINAL. DRAFT versions are published periodically 
    112    to provide notification to the ZIP community of pending changes and to 
    113    provide opportunity for review and comment.
    114 
    115    2.1.2 Versions of this document having a STATUS of FINAL are 
    116    considered to be in the final form for that version of the document 
    117    and are not subject to further change until a new, higher version
    118    numbered document is published.  Newer versions of this format 
    119    specification are intended to remain interoperable with all prior 
    120    versions whenever technically possible.  
    121 
    122 2.2 Change Log
    123 --------------
    124 
    125    Version       Change Description                        Date
    126    -------       ------------------                       ----------
    127    5.2           -Single Password Symmetric Encryption    07/16/2003
    128                   storage
    129 
    130    6.1.0         -Smartcard compatibility                 01/20/2004
    131                  -Documentation on certificate storage
    132 
    133    6.2.0         -Introduction of Central Directory       04/26/2004
    134                   Encryption for encrypting metadata
    135                  -Added OS X to Version Made By values
    136 
    137    6.2.1         -Added Extra Field placeholder for       04/01/2005
    138                   POSZIP using ID 0x4690
    139 
    140                  -Clarified size field on 
    141                   "zip64 end of central directory record"
    142 
    143    6.2.2         -Documented Final Feature Specification  01/06/2006
    144                   for Strong Encryption
    145 
    146                  -Clarifications and typographical 
    147                   corrections
    148 
    149    6.3.0         -Added tape positioning storage          09/29/2006
    150                   parameters
    151 
    152                  -Expanded list of supported hash algorithms
    153 
    154                  -Expanded list of supported compression
    155                   algorithms
    156 
    157                  -Expanded list of supported encryption
    158                   algorithms
    159 
    160                  -Added option for Unicode filename 
    161                   storage
    162 
    163                  -Clarifications for consistent use
    164                   of Data Descriptor records
    165 
    166                  -Added additional "Extra Field" 
    167                   definitions
    168 
    169    6.3.1         -Corrected standard hash values for      04/11/2007
    170                   SHA-256/384/512
    171 
    172    6.3.2         -Added compression method 97             09/28/2007
    173 
    174                  -Documented InfoZIP "Extra Field"
    175                   values for UTF-8 file name and
    176                   file comment storage
    177 
    178    6.3.3         -Formatting changes to support           09/01/2012
    179                   easier referencing of this APPNOTE
    180                   from other documents and standards  
    181 
    182    6.3.4         -Address change                          10/01/2014
    183  
    184    6.3.5         -Documented compression methods 16       11/31/2018
    185                   and 99 (4.4.5, 4.6.1, 5.11, 5.17, 
    186                   APPENDIX E)
    187 
    188                  -Corrected several typographical 
    189                   errors (2.1.2, 3.2, 4.1.1, 10.2)
    190                  
    191                  -Marked legacy algorithms as no
    192                   longer suitable for use (4.4.5.1)
    193 
    194                  -Added clarity on MS DOS time format
    195                   (4.4.6)
    196 
    197                  -Assign extrafield ID for Timestamps
    198                   (4.5.2)
    199 
    200                  -Field code description correction (A.2)
    201 
    202                  -More consistent use of MAY/SHOULD/MUST
    203 
    204                  -Expanded 0x0065 record attribute codes (B.2)
    205 
    206                  -Initial information on 0x0022 Extra Data
    207 
    208    6.3.6         -Corrected typographical error          04/26/2019
    209                   (4.4.1.3) 
    210 
    211    6.3.7         -Added Zstandard compression method ID  
    212                   (4.4.5)
    213 
    214                  -Corrected several reported typos
    215 
    216                  -Marked intended use for general purpose bit 14
    217  
    218                  -Added Data Stream Alignment Extra Data info
    219                   (4.6.11)
    220 
    221    6.3.8         -Resolved Zstandard compression method ID conflict  
    222                   (4.4.5)
    223 
    224                  -Added additional compression method ID values in use
    225 
    226    6.3.9         -Corrected a typo in Data Stream Alignment description
    227                   (4.6.11)
    228 
    229    6.3.10        -Added several z/OS attribute values for APPENDIX B
    230 
    231                  -Added several additional 3rd party Extra Field mappings
    232                   (thanks to Armijn Hemel @tjaldur.nl for forwarding info
    233                   on several of the Header ID's)
    234 
    235 
    236 
    237 3.0 Notations
    238 -------------
    239 
    240    3.1 Use of the term MUST or SHALL indicates a required element. 
    241 
    242    3.2 MUST NOT or SHALL NOT indicates an element is prohibited from use. 
    243 
    244    3.3 SHOULD indicates a RECOMMENDED element.
    245 
    246    3.4 SHOULD NOT indicates an element NOT RECOMMENDED for use.
    247    
    248    3.5 MAY indicates an OPTIONAL element.
    249 
    250 
    251 4.0 ZIP Files
    252 -------------
    253 
    254 4.1 What is a ZIP file
    255 ----------------------
    256 
    257    4.1.1 ZIP files MAY be identified by the standard .ZIP file extension 
    258    although use of a file extension is not required.  Use of the 
    259    extension .ZIPX is also recognized and MAY be used for ZIP files.  
    260    Other common file extensions using the ZIP format include .JAR, .WAR, 
    261    .DOCX, .XLSX, .PPTX, .ODT, .ODS, .ODP and others. Programs reading or 
    262    writing ZIP files SHOULD rely on internal record signatures described 
    263    in this document to identify files in this format.
    264 
    265    4.1.2 ZIP files SHOULD contain at least one file and MAY contain 
    266    multiple files.  
    267 
    268    4.1.3 Data compression MAY be used to reduce the size of files
    269    placed into a ZIP file, but is not required.  This format supports the 
    270    use of multiple data compression algorithms.  When compression is used, 
    271    one of the documented compression algorithms MUST be used.  Implementors 
    272    are advised to experiment with their data to determine which of the 
    273    available algorithms provides the best compression for their needs.
    274    Compression method 8 (Deflate) is the method used by default by most 
    275    ZIP compatible application programs.  
    276 
    277 
    278    4.1.4 Data encryption MAY be used to protect files within a ZIP file. 
    279    Keying methods supported for encryption within this format include
    280    passwords and public/private keys.  Either MAY be used individually
    281    or in combination. Encryption MAY be applied to individual files.  
    282    Additional security MAY be used through the encryption of ZIP file 
    283    metadata stored within the Central Directory. See the section on the 
    284    Strong Encryption Specification for information. Refer to the section 
    285    in this document entitled "Incorporating PKWARE Proprietary Technology 
    286    into Your Product" for more information.
    287 
    288    4.1.5 Data integrity MUST be provided for each file using CRC32.  
    289    
    290    4.1.6 Additional data integrity MAY be included through the use of 
    291    digital signatures.  Individual files MAY be signed with one or more 
    292    digital signatures. The Central Directory, if signed, MUST use a 
    293    single signature.  
    294 
    295    4.1.7 Files MAY be placed within a ZIP file uncompressed or stored. 
    296    The term "stored" as used in the context of this document means the file 
    297    is copied into the ZIP file uncompressed.  
    298 
    299    4.1.8 Each data file placed into a ZIP file MAY be compressed, stored, 
    300    encrypted or digitally signed independent of how other data files in the 
    301    same ZIP file are archived.
    302 
    303    4.1.9 ZIP files MAY be streamed, split into segments (on fixed or on
    304    removable media) or "self-extracting".  Self-extracting ZIP 
    305    files MUST include extraction code for a target platform within 
    306    the ZIP file.  
    307 
    308    4.1.10 Extensibility is provided for platform or application specific
    309    needs through extra data fields that MAY be defined for custom
    310    purposes.  Extra data definitions MUST NOT conflict with existing
    311    documented record definitions.  
    312 
    313    4.1.11 Common uses for ZIP MAY also include the use of manifest files.  
    314    Manifest files store application specific information within a file stored 
    315    within the ZIP file.  This manifest file SHOULD be the first file in the 
    316    ZIP file. This specification does not provide any information or guidance on 
    317    the use of manifest files within ZIP files.  Refer to the application developer
    318    for information on using manifest files and for any additional profile
    319    information on using ZIP within an application.
    320 
    321    4.1.12 ZIP files MAY be placed within other ZIP files.
    322 
    323 4.2 ZIP Metadata
    324 ----------------
    325 
    326    4.2.1 ZIP files are identified by metadata consisting of defined record types 
    327    containing the storage information necessary for maintaining the files 
    328    placed into a ZIP file.  Each record type MUST be identified using a header 
    329    signature that identifies the record type.  Signature values begin with the 
    330    two byte constant marker of 0x4b50, representing the characters "PK".
    331 
    332 
    333 4.3 General Format of a .ZIP file
    334 ---------------------------------
    335 
    336    4.3.1 A ZIP file MUST contain an "end of central directory record". A ZIP 
    337    file containing only an "end of central directory record" is considered an 
    338    empty ZIP file.  Files MAY be added or replaced within a ZIP file, or deleted. 
    339    A ZIP file MUST have only one "end of central directory record".  Other 
    340    records defined in this specification MAY be used as needed to support 
    341    storage requirements for individual ZIP files.
    342 
    343    4.3.2 Each file placed into a ZIP file MUST be preceded by  a "local 
    344    file header" record for that file.  Each "local file header" MUST be 
    345    accompanied by a corresponding "central directory header" record within 
    346    the central directory section of the ZIP file.
    347 
    348    4.3.3 Files MAY be stored in arbitrary order within a ZIP file.  A ZIP 
    349    file MAY span multiple volumes or it MAY be split into user-defined 
    350    segment sizes. All values MUST be stored in little-endian byte order unless 
    351    otherwise specified in this document for a specific data element. 
    352 
    353    4.3.4 Compression MUST NOT be applied to a "local file header", an "encryption
    354    header", or an "end of central directory record".  Individual "central 
    355    directory records" MUST NOT be compressed, but the aggregate of all central
    356    directory records MAY be compressed.    
    357 
    358    4.3.5 File data MAY be followed by a "data descriptor" for the file.  Data 
    359    descriptors are used to facilitate ZIP file streaming.  
    360 
    361  
    362    4.3.6 Overall .ZIP file format:
    363 
    364       [local file header 1]
    365       [encryption header 1]
    366       [file data 1]
    367       [data descriptor 1]
    368       . 
    369       .
    370       .
    371       [local file header n]
    372       [encryption header n]
    373       [file data n]
    374       [data descriptor n]
    375       [archive decryption header] 
    376       [archive extra data record] 
    377       [central directory header 1]
    378       .
    379       .
    380       .
    381       [central directory header n]
    382       [zip64 end of central directory record]
    383       [zip64 end of central directory locator] 
    384       [end of central directory record]
    385 
    386 
    387    4.3.7  Local file header:
    388 
    389       local file header signature     4 bytes  (0x04034b50)
    390       version needed to extract       2 bytes
    391       general purpose bit flag        2 bytes
    392       compression method              2 bytes
    393       last mod file time              2 bytes
    394       last mod file date              2 bytes
    395       crc-32                          4 bytes
    396       compressed size                 4 bytes
    397       uncompressed size               4 bytes
    398       file name length                2 bytes
    399       extra field length              2 bytes
    400 
    401       file name (variable size)
    402       extra field (variable size)
    403 
    404    4.3.8  File data
    405 
    406       Immediately following the local header for a file
    407       SHOULD be placed the compressed or stored data for the file.
    408       If the file is encrypted, the encryption header for the file 
    409       SHOULD be placed after the local header and before the file 
    410       data. The series of [local file header][encryption header]
    411       [file data][data descriptor] repeats for each file in the 
    412       .ZIP archive. 
    413 
    414       Zero-byte files, directories, and other file types that 
    415       contain no content MUST NOT include file data.
    416 
    417    4.3.9  Data descriptor:
    418 
    419         crc-32                          4 bytes
    420         compressed size                 4 bytes
    421         uncompressed size               4 bytes
    422 
    423       4.3.9.1 This descriptor MUST exist if bit 3 of the general
    424       purpose bit flag is set (see below).  It is byte aligned
    425       and immediately follows the last byte of compressed data.
    426       This descriptor SHOULD be used only when it was not possible to
    427       seek in the output .ZIP file, e.g., when the output .ZIP file
    428       was standard output or a non-seekable device.  For ZIP64(tm) format
    429       archives, the compressed and uncompressed sizes are 8 bytes each.
    430 
    431       4.3.9.2 When compressing files, compressed and uncompressed sizes 
    432       SHOULD be stored in ZIP64 format (as 8 byte values) when a 
    433       file's size exceeds 0xFFFFFFFF.   However ZIP64 format MAY be 
    434       used regardless of the size of a file.  When extracting, if 
    435       the zip64 extended information extra field is present for 
    436       the file the compressed and uncompressed sizes will be 8
    437       byte values.  
    438 
    439       4.3.9.3 Although not originally assigned a signature, the value 
    440       0x08074b50 has commonly been adopted as a signature value 
    441       for the data descriptor record.  Implementers SHOULD be 
    442       aware that ZIP files MAY be encountered with or without this 
    443       signature marking data descriptors and SHOULD account for
    444       either case when reading ZIP files to ensure compatibility.
    445 
    446       4.3.9.4 When writing ZIP files, implementors SHOULD include the
    447       signature value marking the data descriptor record.  When
    448       the signature is used, the fields currently defined for
    449       the data descriptor record will immediately follow the
    450       signature.
    451 
    452       4.3.9.5 An extensible data descriptor will be released in a 
    453       future version of this APPNOTE.  This new record is intended to
    454       resolve conflicts with the use of this record going forward,
    455       and to provide better support for streamed file processing.
    456 
    457       4.3.9.6 When the Central Directory Encryption method is used, 
    458       the data descriptor record is not required, but MAY be used.  
    459       If present, and bit 3 of the general purpose bit field is set to 
    460       indicate its presence, the values in fields of the data descriptor
    461       record MUST be set to binary zeros.  See the section on the Strong 
    462       Encryption Specification for information. Refer to the section in 
    463       this document entitled "Incorporating PKWARE Proprietary Technology 
    464       into Your Product" for more information.
    465 
    466 
    467    4.3.10  Archive decryption header:  
    468 
    469       4.3.10.1 The Archive Decryption Header is introduced in version 6.2
    470       of the ZIP format specification.  This record exists in support
    471       of the Central Directory Encryption Feature implemented as part of 
    472       the Strong Encryption Specification as described in this document.
    473       When the Central Directory Structure is encrypted, this decryption
    474       header MUST precede the encrypted data segment.  
    475 
    476       4.3.10.2 The encrypted data segment SHALL consist of the Archive 
    477       extra data record (if present) and the encrypted Central Directory 
    478       Structure data.  The format of this data record is identical to the 
    479       Decryption header record preceding compressed file data.  If the 
    480       central directory structure is encrypted, the location of the start of
    481       this data record is determined using the Start of Central Directory
    482       field in the Zip64 End of Central Directory record.  See the 
    483       section on the Strong Encryption Specification for information
    484       on the fields used in the Archive Decryption Header record.
    485       Refer to the section in this document entitled "Incorporating 
    486       PKWARE Proprietary Technology into Your Product" for more information.
    487 
    488 
    489    4.3.11  Archive extra data record: 
    490 
    491         archive extra data signature    4 bytes  (0x08064b50)
    492         extra field length              4 bytes
    493         extra field data                (variable size)
    494 
    495       4.3.11.1 The Archive Extra Data Record is introduced in version 6.2
    496       of the ZIP format specification.  This record MAY be used in support
    497       of the Central Directory Encryption Feature implemented as part of 
    498       the Strong Encryption Specification as described in this document.
    499       When present, this record MUST immediately precede the central 
    500       directory data structure.  
    501 
    502       4.3.11.2 The size of this data record SHALL be included in the 
    503       Size of the Central Directory field in the End of Central 
    504       Directory record.  If the central directory structure is compressed, 
    505       but not encrypted, the location of the start of this data record is 
    506       determined using the Start of Central Directory field in the Zip64 
    507       End of Central Directory record. Refer to the section in this document 
    508       entitled "Incorporating PKWARE Proprietary Technology into Your 
    509       Product" for more information.
    510 
    511    4.3.12  Central directory structure:
    512 
    513       [central directory header 1]
    514       .
    515       .
    516       . 
    517       [central directory header n]
    518       [digital signature] 
    519 
    520       File header:
    521 
    522         central file header signature   4 bytes  (0x02014b50)
    523         version made by                 2 bytes
    524         version needed to extract       2 bytes
    525         general purpose bit flag        2 bytes
    526         compression method              2 bytes
    527         last mod file time              2 bytes
    528         last mod file date              2 bytes
    529         crc-32                          4 bytes
    530         compressed size                 4 bytes
    531         uncompressed size               4 bytes
    532         file name length                2 bytes
    533         extra field length              2 bytes
    534         file comment length             2 bytes
    535         disk number start               2 bytes
    536         internal file attributes        2 bytes
    537         external file attributes        4 bytes
    538         relative offset of local header 4 bytes
    539 
    540         file name (variable size)
    541         extra field (variable size)
    542         file comment (variable size)
    543 
    544    4.3.13 Digital signature:
    545 
    546         header signature                4 bytes  (0x05054b50)
    547         size of data                    2 bytes
    548         signature data (variable size)
    549 
    550       With the introduction of the Central Directory Encryption 
    551       feature in version 6.2 of this specification, the Central 
    552       Directory Structure MAY be stored both compressed and encrypted. 
    553       Although not required, it is assumed when encrypting the
    554       Central Directory Structure, that it will be compressed
    555       for greater storage efficiency.  Information on the
    556       Central Directory Encryption feature can be found in the section
    557       describing the Strong Encryption Specification. The Digital 
    558       Signature record will be neither compressed nor encrypted.
    559 
    560    4.3.14  Zip64 end of central directory record
    561 
    562         zip64 end of central dir 
    563         signature                       4 bytes  (0x06064b50)
    564         size of zip64 end of central
    565         directory record                8 bytes
    566         version made by                 2 bytes
    567         version needed to extract       2 bytes
    568         number of this disk             4 bytes
    569         number of the disk with the 
    570         start of the central directory  4 bytes
    571         total number of entries in the
    572         central directory on this disk  8 bytes
    573         total number of entries in the
    574         central directory               8 bytes
    575         size of the central directory   8 bytes
    576         offset of start of central
    577         directory with respect to
    578         the starting disk number        8 bytes
    579         zip64 extensible data sector    (variable size)
    580 
    581       4.3.14.1 The value stored into the "size of zip64 end of central
    582       directory record" SHOULD be the size of the remaining
    583       record and SHOULD NOT include the leading 12 bytes.
    584   
    585       Size = SizeOfFixedFields + SizeOfVariableData - 12.
    586 
    587       4.3.14.2 The above record structure defines Version 1 of the 
    588       zip64 end of central directory record. Version 1 was 
    589       implemented in versions of this specification preceding 
    590       6.2 in support of the ZIP64 large file feature. The 
    591       introduction of the Central Directory Encryption feature 
    592       implemented in version 6.2 as part of the Strong Encryption 
    593       Specification defines Version 2 of this record structure. 
    594       Refer to the section describing the Strong Encryption 
    595       Specification for details on the version 2 format for 
    596       this record. Refer to the section in this document entitled 
    597       "Incorporating PKWARE Proprietary Technology into Your Product"
    598       for more information applicable to use of Version 2 of this
    599       record.
    600 
    601       4.3.14.3 Special purpose data MAY reside in the zip64 extensible 
    602       data sector field following either a V1 or V2 version of this
    603       record.  To ensure identification of this special purpose data
    604       it MUST include an identifying header block consisting of the
    605       following:
    606 
    607          Header ID  -  2 bytes
    608          Data Size  -  4 bytes
    609 
    610       The Header ID field indicates the type of data that is in the 
    611       data block that follows.
    612 
    613       Data Size identifies the number of bytes that follow for this
    614       data block type.
    615 
    616       4.3.14.4 Multiple special purpose data blocks MAY be present. 
    617       Each MUST be preceded by a Header ID and Data Size field.  Current
    618       mappings of Header ID values supported in this field are as
    619       defined in APPENDIX C.
    620 
    621    4.3.15 Zip64 end of central directory locator
    622 
    623       zip64 end of central dir locator 
    624       signature                       4 bytes  (0x07064b50)
    625       number of the disk with the
    626       start of the zip64 end of 
    627       central directory               4 bytes
    628       relative offset of the zip64
    629       end of central directory record 8 bytes
    630       total number of disks           4 bytes
    631         
    632    4.3.16  End of central directory record:
    633 
    634       end of central dir signature    4 bytes  (0x06054b50)
    635       number of this disk             2 bytes
    636       number of the disk with the
    637       start of the central directory  2 bytes
    638       total number of entries in the
    639       central directory on this disk  2 bytes
    640       total number of entries in
    641       the central directory           2 bytes
    642       size of the central directory   4 bytes
    643       offset of start of central
    644       directory with respect to
    645       the starting disk number        4 bytes
    646       .ZIP file comment length        2 bytes
    647       .ZIP file comment       (variable size)
    648                 
    649 4.4  Explanation of fields
    650 --------------------------
    651       
    652    4.4.1 General notes on fields
    653 
    654       4.4.1.1  All fields unless otherwise noted are unsigned and stored
    655       in Intel low-byte:high-byte, low-word:high-word order.
    656 
    657       4.4.1.2  String fields are not null terminated, since the length 
    658       is given explicitly.
    659 
    660       4.4.1.3  The entries in the central directory MAY NOT necessarily
    661       be in the same order that files appear in the .ZIP file.
    662 
    663       4.4.1.4  If one of the fields in the end of central directory
    664       record is too small to hold required data, the field SHOULD be 
    665       set to -1 (0xFFFF or 0xFFFFFFFF) and the ZIP64 format record 
    666       SHOULD be created.
    667 
    668       4.4.1.5  The end of central directory record and the Zip64 end 
    669       of central directory locator record MUST reside on the same 
    670       disk when splitting or spanning an archive.
    671 
    672    4.4.2 version made by (2 bytes)
    673 
    674         4.4.2.1 The upper byte indicates the compatibility of the file
    675         attribute information.  If the external file attributes 
    676         are compatible with MS-DOS and can be read by PKZIP for 
    677         DOS version 2.04g then this value will be zero.  If these 
    678         attributes are not compatible, then this value will 
    679         identify the host system on which the attributes are 
    680         compatible.  Software can use this information to determine
    681         the line record format for text files etc.  
    682 
    683         4.4.2.2 The current mappings are:
    684 
    685          0 - MS-DOS and OS/2 (FAT / VFAT / FAT32 file systems)
    686          1 - Amiga                     2 - OpenVMS
    687          3 - UNIX                      4 - VM/CMS
    688          5 - Atari ST                  6 - OS/2 H.P.F.S.
    689          7 - Macintosh                 8 - Z-System
    690          9 - CP/M                     10 - Windows NTFS
    691         11 - MVS (OS/390 - Z/OS)      12 - VSE
    692         13 - Acorn Risc               14 - VFAT
    693         15 - alternate MVS            16 - BeOS
    694         17 - Tandem                   18 - OS/400
    695         19 - OS X (Darwin)            20 thru 255 - unused
    696 
    697         4.4.2.3 The lower byte indicates the ZIP specification version 
    698         (the version of this document) supported by the software 
    699         used to encode the file.  The value/10 indicates the major 
    700         version number, and the value mod 10 is the minor version 
    701         number.  
    702 
    703    4.4.3 version needed to extract (2 bytes)
    704 
    705         4.4.3.1 The minimum supported ZIP specification version needed 
    706         to extract the file, mapped as above.  This value is based on 
    707         the specific format features a ZIP program MUST support to 
    708         be able to extract the file.  If multiple features are
    709         applied to a file, the minimum version MUST be set to the 
    710         feature having the highest value. New features or feature 
    711         changes affecting the published format specification will be 
    712         implemented using higher version numbers than the last 
    713         published value to avoid conflict.
    714 
    715         4.4.3.2 Current minimum feature versions are as defined below:
    716 
    717          1.0 - Default value
    718          1.1 - File is a volume label
    719          2.0 - File is a folder (directory)
    720          2.0 - File is compressed using Deflate compression
    721          2.0 - File is encrypted using traditional PKWARE encryption
    722          2.1 - File is compressed using Deflate64(tm)
    723          2.5 - File is compressed using PKWARE DCL Implode 
    724          2.7 - File is a patch data set 
    725          4.5 - File uses ZIP64 format extensions
    726          4.6 - File is compressed using BZIP2 compression*
    727          5.0 - File is encrypted using DES
    728          5.0 - File is encrypted using 3DES
    729          5.0 - File is encrypted using original RC2 encryption
    730          5.0 - File is encrypted using RC4 encryption
    731          5.1 - File is encrypted using AES encryption
    732          5.1 - File is encrypted using corrected RC2 encryption**
    733          5.2 - File is encrypted using corrected RC2-64 encryption**
    734          6.1 - File is encrypted using non-OAEP key wrapping***
    735          6.2 - Central directory encryption
    736          6.3 - File is compressed using LZMA
    737          6.3 - File is compressed using PPMd+
    738          6.3 - File is encrypted using Blowfish
    739          6.3 - File is encrypted using Twofish
    740 
    741         4.4.3.3 Notes on version needed to extract 
    742 
    743         * Early 7.x (pre-7.2) versions of PKZIP incorrectly set the
    744         version needed to extract for BZIP2 compression to be 50
    745         when it SHOULD have been 46.
    746 
    747         ** Refer to the section on Strong Encryption Specification
    748         for additional information regarding RC2 corrections.
    749 
    750         *** Certificate encryption using non-OAEP key wrapping is the
    751         intended mode of operation for all versions beginning with 6.1.
    752         Support for OAEP key wrapping MUST only be used for
    753         backward compatibility when sending ZIP files to be opened by
    754         versions of PKZIP older than 6.1 (5.0 or 6.0).
    755 
    756         + Files compressed using PPMd MUST set the version
    757         needed to extract field to 6.3, however, not all ZIP 
    758         programs enforce this and MAY be unable to decompress 
    759         data files compressed using PPMd if this value is set.
    760 
    761         When using ZIP64 extensions, the corresponding value in the
    762         zip64 end of central directory record MUST also be set.  
    763         This field SHOULD be set appropriately to indicate whether 
    764         Version 1 or Version 2 format is in use. 
    765 
    766 
    767    4.4.4 general purpose bit flag: (2 bytes)
    768 
    769         Bit 0: If set, indicates that the file is encrypted.
    770 
    771         (For Method 6 - Imploding)
    772         Bit 1: If the compression method used was type 6,
    773                Imploding, then this bit, if set, indicates
    774                an 8K sliding dictionary was used.  If clear,
    775                then a 4K sliding dictionary was used.
    776 
    777         Bit 2: If the compression method used was type 6,
    778                Imploding, then this bit, if set, indicates
    779                3 Shannon-Fano trees were used to encode the
    780                sliding dictionary output.  If clear, then 2
    781                Shannon-Fano trees were used.
    782 
    783         (For Methods 8 and 9 - Deflating)
    784         Bit 2  Bit 1
    785           0      0    Normal (-en) compression option was used.
    786           0      1    Maximum (-exx/-ex) compression option was used.
    787           1      0    Fast (-ef) compression option was used.
    788           1      1    Super Fast (-es) compression option was used.
    789 
    790         (For Method 14 - LZMA)
    791         Bit 1: If the compression method used was type 14,
    792                LZMA, then this bit, if set, indicates
    793                an end-of-stream (EOS) marker is used to
    794                mark the end of the compressed data stream.
    795                If clear, then an EOS marker is not present
    796                and the compressed data size must be known
    797                to extract.
    798 
    799         Note:  Bits 1 and 2 are undefined if the compression
    800                method is any other.
    801 
    802         Bit 3: If this bit is set, the fields crc-32, compressed 
    803                size and uncompressed size are set to zero in the 
    804                local header.  The correct values are put in the 
    805                data descriptor immediately following the compressed
    806                data.  (Note: PKZIP version 2.04g for DOS only 
    807                recognizes this bit for method 8 compression, newer 
    808                versions of PKZIP recognize this bit for any 
    809                compression method.)
    810 
    811         Bit 4: Reserved for use with method 8, for enhanced
    812                deflating. 
    813 
    814         Bit 5: If this bit is set, this indicates that the file is 
    815                compressed patched data.  (Note: Requires PKZIP 
    816                version 2.70 or greater)
    817 
    818         Bit 6: Strong encryption.  If this bit is set, you MUST
    819                set the version needed to extract value to at least
    820                50 and you MUST also set bit 0.  If AES encryption
    821                is used, the version needed to extract value MUST 
    822                be at least 51. See the section describing the Strong
    823                Encryption Specification for details.  Refer to the 
    824                section in this document entitled "Incorporating PKWARE 
    825                Proprietary Technology into Your Product" for more 
    826                information.
    827 
    828         Bit 7: Currently unused.
    829 
    830         Bit 8: Currently unused.
    831 
    832         Bit 9: Currently unused.
    833 
    834         Bit 10: Currently unused.
    835 
    836         Bit 11: Language encoding flag (EFS).  If this bit is set,
    837                 the filename and comment fields for this file
    838                 MUST be encoded using UTF-8. (see APPENDIX D)
    839 
    840         Bit 12: Reserved by PKWARE for enhanced compression.
    841 
    842         Bit 13: Set when encrypting the Central Directory to indicate 
    843                 selected data values in the Local Header are masked to
    844                 hide their actual values.  See the section describing 
    845                 the Strong Encryption Specification for details.  Refer
    846                 to the section in this document entitled "Incorporating 
    847                 PKWARE Proprietary Technology into Your Product" for 
    848                 more information.
    849 
    850         Bit 14: Reserved by PKWARE for alternate streams.
    851 
    852         Bit 15: Reserved by PKWARE.
    853 
    854    4.4.5 compression method: (2 bytes)
    855 
    856         0 - The file is stored (no compression)
    857         1 - The file is Shrunk
    858         2 - The file is Reduced with compression factor 1
    859         3 - The file is Reduced with compression factor 2
    860         4 - The file is Reduced with compression factor 3
    861         5 - The file is Reduced with compression factor 4
    862         6 - The file is Imploded
    863         7 - Reserved for Tokenizing compression algorithm
    864         8 - The file is Deflated
    865         9 - Enhanced Deflating using Deflate64(tm)
    866        10 - PKWARE Data Compression Library Imploding (old IBM TERSE)
    867        11 - Reserved by PKWARE
    868        12 - File is compressed using BZIP2 algorithm
    869        13 - Reserved by PKWARE
    870        14 - LZMA
    871        15 - Reserved by PKWARE
    872        16 - IBM z/OS CMPSC Compression
    873        17 - Reserved by PKWARE
    874        18 - File is compressed using IBM TERSE (new)
    875        19 - IBM LZ77 z Architecture 
    876        20 - deprecated (use method 93 for zstd)
    877        93 - Zstandard (zstd) Compression 
    878        94 - MP3 Compression 
    879        95 - XZ Compression 
    880        96 - JPEG variant
    881        97 - WavPack compressed data
    882        98 - PPMd version I, Rev 1
    883        99 - AE-x encryption marker (see APPENDIX E)
    884 
    885        4.4.5.1 Methods 1-6 are legacy algorithms and are no longer
    886        recommended for use when compressing files.
    887 
    888    4.4.6 date and time fields: (2 bytes each)
    889 
    890        The date and time are encoded in standard MS-DOS format.
    891        If input came from standard input, the date and time are
    892        those at which compression was started for this data. 
    893        If encrypting the central directory and general purpose bit 
    894        flag 13 is set indicating masking, the value stored in the 
    895        Local Header will be zero. MS-DOS time format is different
    896        from more commonly used computer time formats such as 
    897        UTC. For example, MS-DOS uses year values relative to 1980
    898        and 2 second precision.
    899 
    900    4.4.7 CRC-32: (4 bytes)
    901 
    902        The CRC-32 algorithm was generously contributed by
    903        David Schwaderer and can be found in his excellent
    904        book "C Programmers Guide to NetBIOS" published by
    905        Howard W. Sams & Co. Inc.  The 'magic number' for
    906        the CRC is 0xdebb20e3.  The proper CRC pre and post
    907        conditioning is used, meaning that the CRC register
    908        is pre-conditioned with all ones (a starting value
    909        of 0xffffffff) and the value is post-conditioned by
    910        taking the one's complement of the CRC residual.
    911        If bit 3 of the general purpose flag is set, this
    912        field is set to zero in the local header and the correct
    913        value is put in the data descriptor and in the central
    914        directory. When encrypting the central directory, if the
    915        local header is not in ZIP64 format and general purpose 
    916        bit flag 13 is set indicating masking, the value stored 
    917        in the Local Header will be zero. 
    918 
    919    4.4.8 compressed size: (4 bytes)
    920    4.4.9 uncompressed size: (4 bytes)
    921 
    922        The size of the file compressed (4.4.8) and uncompressed,
    923        (4.4.9) respectively.  When a decryption header is present it 
    924        will be placed in front of the file data and the value of the
    925        compressed file size will include the bytes of the decryption
    926        header.  If bit 3 of the general purpose bit flag is set, 
    927        these fields are set to zero in the local header and the 
    928        correct values are put in the data descriptor and
    929        in the central directory.  If an archive is in ZIP64 format
    930        and the value in this field is 0xFFFFFFFF, the size will be
    931        in the corresponding 8 byte ZIP64 extended information 
    932        extra field.  When encrypting the central directory, if the
    933        local header is not in ZIP64 format and general purpose bit 
    934        flag 13 is set indicating masking, the value stored for the 
    935        uncompressed size in the Local Header will be zero. 
    936 
    937    4.4.10 file name length: (2 bytes)
    938    4.4.11 extra field length: (2 bytes)
    939    4.4.12 file comment length: (2 bytes)
    940 
    941        The length of the file name, extra field, and comment
    942        fields respectively.  The combined length of any
    943        directory record and these three fields SHOULD NOT
    944        generally exceed 65,535 bytes.  If input came from standard
    945        input, the file name length is set to zero.  
    946 
    947 
    948    4.4.13 disk number start: (2 bytes)
    949 
    950        The number of the disk on which this file begins.  If an 
    951        archive is in ZIP64 format and the value in this field is 
    952        0xFFFF, the size will be in the corresponding 4 byte zip64 
    953        extended information extra field.
    954 
    955    4.4.14 internal file attributes: (2 bytes)
    956 
    957        Bits 1 and 2 are reserved for use by PKWARE.
    958 
    959        4.4.14.1 The lowest bit of this field indicates, if set, 
    960        that the file is apparently an ASCII or text file.  If not
    961        set, that the file apparently contains binary data.
    962        The remaining bits are unused in version 1.0.
    963 
    964        4.4.14.2 The 0x0002 bit of this field indicates, if set, that 
    965        a 4 byte variable record length control field precedes each 
    966        logical record indicating the length of the record. The 
    967        record length control field is stored in little-endian byte
    968        order.  This flag is independent of text control characters, 
    969        and if used in conjunction with text data, includes any 
    970        control characters in the total length of the record. This 
    971        value is provided for mainframe data transfer support.
    972 
    973    4.4.15 external file attributes: (4 bytes)
    974 
    975        The mapping of the external attributes is
    976        host-system dependent (see 'version made by').  For
    977        MS-DOS, the low order byte is the MS-DOS directory
    978        attribute byte.  If input came from standard input, this
    979        field is set to zero.
    980 
    981    4.4.16 relative offset of local header: (4 bytes)
    982 
    983        This is the offset from the start of the first disk on
    984        which this file appears, to where the local header SHOULD
    985        be found.  If an archive is in ZIP64 format and the value
    986        in this field is 0xFFFFFFFF, the size will be in the 
    987        corresponding 8 byte zip64 extended information extra field.
    988 
    989    4.4.17 file name: (Variable)
    990 
    991        4.4.17.1 The name of the file, with optional relative path.
    992        The path stored MUST NOT contain a drive or
    993        device letter, or a leading slash.  All slashes
    994        MUST be forward slashes '/' as opposed to
    995        backwards slashes '\' for compatibility with Amiga
    996        and UNIX file systems etc.  If input came from standard
    997        input, there is no file name field.  
    998 
    999        4.4.17.2 If using the Central Directory Encryption Feature and 
   1000        general purpose bit flag 13 is set indicating masking, the file 
   1001        name stored in the Local Header will not be the actual file name.  
   1002        A masking value consisting of a unique hexadecimal value will 
   1003        be stored.  This value will be sequentially incremented for each 
   1004        file in the archive. See the section on the Strong Encryption 
   1005        Specification for details on retrieving the encrypted file name. 
   1006        Refer to the section in this document entitled "Incorporating PKWARE 
   1007        Proprietary Technology into Your Product" for more information.
   1008 
   1009 
   1010    4.4.18 file comment: (Variable)
   1011 
   1012        The comment for this file.
   1013 
   1014    4.4.19 number of this disk: (2 bytes)
   1015 
   1016        The number of this disk, which contains central
   1017        directory end record. If an archive is in ZIP64 format
   1018        and the value in this field is 0xFFFF, the size will 
   1019        be in the corresponding 4 byte zip64 end of central 
   1020        directory field.
   1021 
   1022 
   1023    4.4.20 number of the disk with the start of the central
   1024             directory: (2 bytes)
   1025 
   1026        The number of the disk on which the central
   1027        directory starts. If an archive is in ZIP64 format
   1028        and the value in this field is 0xFFFF, the size will 
   1029        be in the corresponding 4 byte zip64 end of central 
   1030        directory field.
   1031 
   1032    4.4.21 total number of entries in the central dir on 
   1033           this disk: (2 bytes)
   1034 
   1035       The number of central directory entries on this disk.
   1036       If an archive is in ZIP64 format and the value in 
   1037       this field is 0xFFFF, the size will be in the 
   1038       corresponding 8 byte zip64 end of central 
   1039       directory field.
   1040 
   1041    4.4.22 total number of entries in the central dir: (2 bytes)
   1042 
   1043       The total number of files in the .ZIP file. If an 
   1044       archive is in ZIP64 format and the value in this field
   1045       is 0xFFFF, the size will be in the corresponding 8 byte 
   1046       zip64 end of central directory field.
   1047 
   1048    4.4.23 size of the central directory: (4 bytes)
   1049 
   1050       The size (in bytes) of the entire central directory.
   1051       If an archive is in ZIP64 format and the value in 
   1052       this field is 0xFFFFFFFF, the size will be in the 
   1053       corresponding 8 byte zip64 end of central 
   1054       directory field.
   1055 
   1056    4.4.24 offset of start of central directory with respect to
   1057           the starting disk number:  (4 bytes)
   1058 
   1059       Offset of the start of the central directory on the
   1060       disk on which the central directory starts. If an 
   1061       archive is in ZIP64 format and the value in this 
   1062       field is 0xFFFFFFFF, the size will be in the 
   1063       corresponding 8 byte zip64 end of central 
   1064       directory field.
   1065 
   1066    4.4.25 .ZIP file comment length: (2 bytes)
   1067 
   1068       The length of the comment for this .ZIP file.
   1069 
   1070    4.4.26 .ZIP file comment: (Variable)
   1071 
   1072       The comment for this .ZIP file.  ZIP file comment data
   1073       is stored unsecured.  No encryption or data authentication
   1074       is applied to this area at this time.  Confidential information
   1075       SHOULD NOT be stored in this section.
   1076 
   1077    4.4.27 zip64 extensible data sector    (variable size)
   1078 
   1079       (currently reserved for use by PKWARE)
   1080 
   1081 
   1082    4.4.28 extra field: (Variable)
   1083 
   1084      This SHOULD be used for storage expansion.  If additional 
   1085      information needs to be stored within a ZIP file for special 
   1086      application or platform needs, it SHOULD be stored here.  
   1087      Programs supporting earlier versions of this specification can 
   1088      then safely skip the file, and find the next file or header.  
   1089      This field will be 0 length in version 1.0.  
   1090 
   1091      Existing extra fields are defined in the section
   1092      Extensible data fields that follows.
   1093 
   1094 4.5 Extensible data fields
   1095 --------------------------
   1096 
   1097    4.5.1 In order to allow different programs and different types
   1098    of information to be stored in the 'extra' field in .ZIP
   1099    files, the following structure MUST be used for all
   1100    programs storing data in this field:
   1101 
   1102        header1+data1 + header2+data2 . . .
   1103 
   1104    Each header MUST consist of:
   1105 
   1106        Header ID - 2 bytes
   1107        Data Size - 2 bytes
   1108 
   1109    Note: all fields stored in Intel low-byte/high-byte order.
   1110 
   1111    The Header ID field indicates the type of data that is in
   1112    the following data block.
   1113 
   1114    Header IDs of 0 thru 31 are reserved for use by PKWARE.
   1115    The remaining IDs can be used by third party vendors for
   1116    proprietary usage.
   1117 
   1118    4.5.2 The current Header ID mappings defined by PKWARE are:
   1119 
   1120       0x0001        Zip64 extended information extra field
   1121       0x0007        AV Info
   1122       0x0008        Reserved for extended language encoding data (PFS)
   1123                     (see APPENDIX D)
   1124       0x0009        OS/2
   1125       0x000a        NTFS 
   1126       0x000c        OpenVMS
   1127       0x000d        UNIX
   1128       0x000e        Reserved for file stream and fork descriptors
   1129       0x000f        Patch Descriptor
   1130       0x0014        PKCS#7 Store for X.509 Certificates
   1131       0x0015        X.509 Certificate ID and Signature for 
   1132                     individual file
   1133       0x0016        X.509 Certificate ID for Central Directory
   1134       0x0017        Strong Encryption Header
   1135       0x0018        Record Management Controls
   1136       0x0019        PKCS#7 Encryption Recipient Certificate List
   1137       0x0020        Reserved for Timestamp record
   1138       0x0021        Policy Decryption Key Record
   1139       0x0022        Smartcrypt Key Provider Record
   1140       0x0023        Smartcrypt Policy Key Data Record
   1141       0x0065        IBM S/390 (Z390), AS/400 (I400) attributes 
   1142                     - uncompressed
   1143       0x0066        Reserved for IBM S/390 (Z390), AS/400 (I400) 
   1144                     attributes - compressed
   1145       0x4690        POSZIP 4690 (reserved) 
   1146 
   1147 
   1148    4.5.3 -Zip64 Extended Information Extra Field (0x0001):
   1149 
   1150       The following is the layout of the zip64 extended 
   1151       information "extra" block. If one of the size or
   1152       offset fields in the Local or Central directory
   1153       record is too small to hold the required data,
   1154       a Zip64 extended information record is created.
   1155       The order of the fields in the zip64 extended 
   1156       information record is fixed, but the fields MUST
   1157       only appear if the corresponding Local or Central
   1158       directory record field is set to 0xFFFF or 0xFFFFFFFF.
   1159 
   1160       Note: all fields stored in Intel low-byte/high-byte order.
   1161 
   1162         Value      Size       Description
   1163         -----      ----       -----------
   1164 (ZIP64) 0x0001     2 bytes    Tag for this "extra" block type
   1165         Size       2 bytes    Size of this "extra" block
   1166         Original 
   1167         Size       8 bytes    Original uncompressed file size
   1168         Compressed
   1169         Size       8 bytes    Size of compressed data
   1170         Relative Header
   1171         Offset     8 bytes    Offset of local header record
   1172         Disk Start
   1173         Number     4 bytes    Number of the disk on which
   1174                               this file starts 
   1175 
   1176       This entry in the Local header MUST include BOTH original
   1177       and compressed file size fields. If encrypting the 
   1178       central directory and bit 13 of the general purpose bit
   1179       flag is set indicating masking, the value stored in the
   1180       Local Header for the original file size will be zero.
   1181 
   1182 
   1183    4.5.4 -OS/2 Extra Field (0x0009):
   1184 
   1185       The following is the layout of the OS/2 attributes "extra" 
   1186       block.  (Last Revision  09/05/95)
   1187 
   1188       Note: all fields stored in Intel low-byte/high-byte order.
   1189 
   1190         Value       Size          Description
   1191         -----       ----          -----------
   1192 (OS/2)  0x0009      2 bytes       Tag for this "extra" block type
   1193         TSize       2 bytes       Size for the following data block
   1194         BSize       4 bytes       Uncompressed Block Size
   1195         CType       2 bytes       Compression type
   1196         EACRC       4 bytes       CRC value for uncompress block
   1197         (var)       variable      Compressed block
   1198 
   1199       The OS/2 extended attribute structure (FEA2LIST) is 
   1200       compressed and then stored in its entirety within this 
   1201       structure.  There will only ever be one "block" of data in 
   1202       VarFields[].
   1203 
   1204    4.5.5 -NTFS Extra Field (0x000a):
   1205 
   1206       The following is the layout of the NTFS attributes 
   1207       "extra" block. (Note: At this time the Mtime, Atime
   1208       and Ctime values MAY be used on any WIN32 system.)  
   1209 
   1210       Note: all fields stored in Intel low-byte/high-byte order.
   1211 
   1212         Value      Size       Description
   1213         -----      ----       -----------
   1214 (NTFS)  0x000a     2 bytes    Tag for this "extra" block type
   1215         TSize      2 bytes    Size of the total "extra" block
   1216         Reserved   4 bytes    Reserved for future use
   1217         Tag1       2 bytes    NTFS attribute tag value #1
   1218         Size1      2 bytes    Size of attribute #1, in bytes
   1219         (var)      Size1      Attribute #1 data
   1220          .
   1221          .
   1222          .
   1223          TagN       2 bytes    NTFS attribute tag value #N
   1224          SizeN      2 bytes    Size of attribute #N, in bytes
   1225          (var)      SizeN      Attribute #N data
   1226 
   1227        For NTFS, values for Tag1 through TagN are as follows:
   1228        (currently only one set of attributes is defined for NTFS)
   1229 
   1230          Tag        Size       Description
   1231          -----      ----       -----------
   1232          0x0001     2 bytes    Tag for attribute #1 
   1233          Size1      2 bytes    Size of attribute #1, in bytes
   1234          Mtime      8 bytes    File last modification time
   1235          Atime      8 bytes    File last access time
   1236          Ctime      8 bytes    File creation time
   1237 
   1238    4.5.6 -OpenVMS Extra Field (0x000c):
   1239 
   1240        The following is the layout of the OpenVMS attributes 
   1241        "extra" block.
   1242 
   1243        Note: all fields stored in Intel low-byte/high-byte order.
   1244 
   1245          Value      Size       Description
   1246          -----      ----       -----------
   1247  (VMS)   0x000c     2 bytes    Tag for this "extra" block type
   1248          TSize      2 bytes    Size of the total "extra" block
   1249          CRC        4 bytes    32-bit CRC for remainder of the block
   1250          Tag1       2 bytes    OpenVMS attribute tag value #1
   1251          Size1      2 bytes    Size of attribute #1, in bytes
   1252          (var)      Size1      Attribute #1 data
   1253          .
   1254          .
   1255          .
   1256          TagN       2 bytes    OpenVMS attribute tag value #N
   1257          SizeN      2 bytes    Size of attribute #N, in bytes
   1258          (var)      SizeN      Attribute #N data
   1259 
   1260        OpenVMS Extra Field Rules:
   1261 
   1262           4.5.6.1. There will be one or more attributes present, which 
   1263           will each be preceded by the above TagX & SizeX values.  
   1264           These values are identical to the ATR$C_XXXX and ATR$S_XXXX 
   1265           constants which are defined in ATR.H under OpenVMS C.  Neither 
   1266           of these values will ever be zero.
   1267 
   1268           4.5.6.2. No word alignment or padding is performed.
   1269 
   1270           4.5.6.3. A well-behaved PKZIP/OpenVMS program SHOULD NOT produce
   1271           more than one sub-block with the same TagX value.  Also, there MUST 
   1272           NOT be more than one "extra" block of type 0x000c in a particular 
   1273           directory record.
   1274 
   1275    4.5.7 -UNIX Extra Field (0x000d):
   1276 
   1277         The following is the layout of the UNIX "extra" block.
   1278         Note: all fields are stored in Intel low-byte/high-byte 
   1279         order.
   1280 
   1281         Value       Size          Description
   1282         -----       ----          -----------
   1283 (UNIX)  0x000d      2 bytes       Tag for this "extra" block type
   1284         TSize       2 bytes       Size for the following data block
   1285         Atime       4 bytes       File last access time
   1286         Mtime       4 bytes       File last modification time
   1287         Uid         2 bytes       File user ID
   1288         Gid         2 bytes       File group ID
   1289         (var)       variable      Variable length data field
   1290 
   1291         The variable length data field will contain file type 
   1292         specific data.  Currently the only values allowed are
   1293         the original "linked to" file names for hard or symbolic 
   1294         links, and the major and minor device node numbers for
   1295         character and block device nodes.  Since device nodes
   1296         cannot be either symbolic or hard links, only one set of
   1297         variable length data is stored.  Link files will have the
   1298         name of the original file stored.  This name is NOT NULL
   1299         terminated.  Its size can be determined by checking TSize -
   1300         12.  Device entries will have eight bytes stored as two 4
   1301         byte entries (in little endian format).  The first entry
   1302         will be the major device number, and the second the minor
   1303         device number.
   1304                           
   1305    4.5.8 -PATCH Descriptor Extra Field (0x000f):
   1306 
   1307         4.5.8.1 The following is the layout of the Patch Descriptor 
   1308         "extra" block.
   1309 
   1310         Note: all fields stored in Intel low-byte/high-byte order.
   1311 
   1312         Value     Size     Description
   1313         -----     ----     -----------
   1314 (Patch) 0x000f    2 bytes  Tag for this "extra" block type
   1315         TSize     2 bytes  Size of the total "extra" block
   1316         Version   2 bytes  Version of the descriptor
   1317         Flags     4 bytes  Actions and reactions (see below) 
   1318         OldSize   4 bytes  Size of the file about to be patched 
   1319         OldCRC    4 bytes  32-bit CRC of the file to be patched 
   1320         NewSize   4 bytes  Size of the resulting file 
   1321         NewCRC    4 bytes  32-bit CRC of the resulting file 
   1322 
   1323         4.5.8.2 Actions and reactions
   1324 
   1325         Bits          Description
   1326         ----          ----------------
   1327         0             Use for auto detection
   1328         1             Treat as a self-patch
   1329         2-3           RESERVED
   1330         4-5           Action (see below)
   1331         6-7           RESERVED
   1332         8-9           Reaction (see below) to absent file 
   1333         10-11         Reaction (see below) to newer file
   1334         12-13         Reaction (see below) to unknown file
   1335         14-15         RESERVED
   1336         16-31         RESERVED
   1337 
   1338            4.5.8.2.1 Actions
   1339 
   1340            Action       Value
   1341            ------       ----- 
   1342            none         0
   1343            add          1
   1344            delete       2
   1345            patch        3
   1346 
   1347            4.5.8.2.2 Reactions
   1348         
   1349            Reaction     Value
   1350            --------     -----
   1351            ask          0
   1352            skip         1
   1353            ignore       2
   1354            fail         3
   1355 
   1356         4.5.8.3 Patch support is provided by PKPatchMaker(tm) technology 
   1357         and is covered under U.S. Patents and Patents Pending. The use or 
   1358         implementation in a product of certain technological aspects set
   1359         forth in the current APPNOTE, including those with regard to 
   1360         strong encryption or patching requires a license from PKWARE.  
   1361         Refer to the section in this document entitled "Incorporating 
   1362         PKWARE Proprietary Technology into Your Product" for more 
   1363         information. 
   1364 
   1365    4.5.9 -PKCS#7 Store for X.509 Certificates (0x0014):
   1366 
   1367         This field MUST contain information about each of the certificates 
   1368         files MAY be signed with. When the Central Directory Encryption 
   1369         feature is enabled for a ZIP file, this record will appear in 
   1370         the Archive Extra Data Record, otherwise it will appear in the 
   1371         first central directory record and will be ignored in any 
   1372         other record.  
   1373 
   1374                           
   1375         Note: all fields stored in Intel low-byte/high-byte order.
   1376 
   1377         Value     Size     Description
   1378         -----     ----     -----------
   1379 (Store) 0x0014    2 bytes  Tag for this "extra" block type
   1380         TSize     2 bytes  Size of the store data
   1381         TData     TSize    Data about the store
   1382 
   1383 
   1384    4.5.10 -X.509 Certificate ID and Signature for individual file (0x0015):
   1385 
   1386         This field contains the information about which certificate in 
   1387         the PKCS#7 store was used to sign a particular file. It also 
   1388         contains the signature data. This field can appear multiple 
   1389         times, but can only appear once per certificate.
   1390 
   1391         Note: all fields stored in Intel low-byte/high-byte order.
   1392 
   1393         Value     Size     Description
   1394         -----     ----     -----------
   1395 (CID)   0x0015    2 bytes  Tag for this "extra" block type
   1396         TSize     2 bytes  Size of data that follows
   1397         TData     TSize    Signature Data
   1398 
   1399    4.5.11 -X.509 Certificate ID and Signature for central directory (0x0016):
   1400 
   1401         This field contains the information about which certificate in 
   1402         the PKCS#7 store was used to sign the central directory structure.
   1403         When the Central Directory Encryption feature is enabled for a 
   1404         ZIP file, this record will appear in the Archive Extra Data Record, 
   1405         otherwise it will appear in the first central directory record.
   1406 
   1407         Note: all fields stored in Intel low-byte/high-byte order.
   1408 
   1409         Value     Size     Description
   1410         -----     ----     -----------
   1411 (CDID)  0x0016    2 bytes  Tag for this "extra" block type
   1412         TSize     2 bytes  Size of data that follows
   1413         TData     TSize    Data
   1414 
   1415    4.5.12 -Strong Encryption Header (0x0017):
   1416 
   1417         Value     Size     Description
   1418         -----     ----     -----------
   1419         0x0017    2 bytes  Tag for this "extra" block type
   1420         TSize     2 bytes  Size of data that follows
   1421         Format    2 bytes  Format definition for this record
   1422         AlgID     2 bytes  Encryption algorithm identifier
   1423         Bitlen    2 bytes  Bit length of encryption key
   1424         Flags     2 bytes  Processing flags
   1425         CertData  TSize-8  Certificate decryption extra field data
   1426                            (refer to the explanation for CertData
   1427                             in the section describing the 
   1428                             Certificate Processing Method under 
   1429                             the Strong Encryption Specification)
   1430 
   1431         See the section describing the Strong Encryption Specification 
   1432         for details.  Refer to the section in this document entitled 
   1433         "Incorporating PKWARE Proprietary Technology into Your Product" 
   1434         for more information.
   1435 
   1436    4.5.13 -Record Management Controls (0x0018):
   1437 
   1438           Value     Size     Description
   1439           -----     ----     -----------
   1440 (Rec-CTL) 0x0018    2 bytes  Tag for this "extra" block type
   1441           CSize     2 bytes  Size of total extra block data
   1442           Tag1      2 bytes  Record control attribute 1
   1443           Size1     2 bytes  Size of attribute 1, in bytes
   1444           Data1     Size1    Attribute 1 data
   1445           .
   1446           .
   1447           .
   1448           TagN      2 bytes  Record control attribute N
   1449           SizeN     2 bytes  Size of attribute N, in bytes
   1450           DataN     SizeN    Attribute N data
   1451 
   1452 
   1453    4.5.14 -PKCS#7 Encryption Recipient Certificate List (0x0019): 
   1454 
   1455         This field MAY contain information about each of the certificates
   1456         used in encryption processing and it can be used to identify who is
   1457         allowed to decrypt encrypted files.  This field SHOULD only appear 
   1458         in the archive extra data record. This field is not required and 
   1459         serves only to aid archive modifications by preserving public 
   1460         encryption key data. Individual security requirements may dictate 
   1461         that this data be omitted to deter information exposure.
   1462 
   1463         Note: all fields stored in Intel low-byte/high-byte order.
   1464 
   1465          Value     Size     Description
   1466          -----     ----     -----------
   1467 (CStore) 0x0019    2 bytes  Tag for this "extra" block type
   1468          TSize     2 bytes  Size of the store data
   1469          TData     TSize    Data about the store
   1470 
   1471          TData:
   1472 
   1473          Value     Size     Description
   1474          -----     ----     -----------
   1475          Version   2 bytes  Format version number - MUST be 0x0001 at this time
   1476          CStore    (var)    PKCS#7 data blob
   1477 
   1478          See the section describing the Strong Encryption Specification 
   1479          for details.  Refer to the section in this document entitled 
   1480          "Incorporating PKWARE Proprietary Technology into Your Product" 
   1481          for more information.
   1482 
   1483    4.5.15 -MVS Extra Field (0x0065):
   1484 
   1485         The following is the layout of the MVS "extra" block.
   1486         Note: Some fields are stored in Big Endian format.
   1487         All text is in EBCDIC format unless otherwise specified.
   1488 Value     Size      Description
   1489         -----     ----      -----------
   1490 (MVS)   0x0065    2 bytes   Tag for this "extra" block type
   1491         TSize     2 bytes   Size for the following data block
   1492         ID        4 bytes   EBCDIC "Z390" 0xE9F3F9F0 or
   1493                             "T4MV" for TargetFour
   1494         (var)     TSize-4   Attribute data (see APPENDIX B)
   1495 
   1496 
   1497    4.5.16 -OS/400 Extra Field (0x0065):
   1498 
   1499         The following is the layout of the OS/400 "extra" block.
   1500         Note: Some fields are stored in Big Endian format.
   1501         All text is in EBCDIC format unless otherwise specified.
   1502 
   1503         Value     Size       Description
   1504         -----     ----       -----------
   1505 (OS400) 0x0065    2 bytes    Tag for this "extra" block type
   1506         TSize     2 bytes    Size for the following data block
   1507         ID        4 bytes    EBCDIC "I400" 0xC9F4F0F0 or
   1508                              "T4MV" for TargetFour
   1509         (var)     TSize-4    Attribute data (see APPENDIX A)
   1510 
   1511    4.5.17 -Policy Decryption Key Record Extra Field (0x0021):
   1512 
   1513         The following is the layout of the Policy Decryption Key "extra" block.
   1514         TData is a variable length, variable content field.  It holds
   1515         information about encryptions and/or encryption key sources.
   1516         Contact PKWARE for information on current TData structures.
   1517         Information in this "extra" block may aternatively be placed
   1518         within comment fields.  Refer to the section in this document 
   1519         entitled "Incorporating PKWARE Proprietary Technology into Your 
   1520         Product" for more information.
   1521 
   1522         Value     Size       Description
   1523         -----     ----       -----------
   1524         0x0021    2 bytes    Tag for this "extra" block type
   1525         TSize     2 bytes    Size for the following data block
   1526         TData     TSize      Data about the key
   1527 
   1528    4.5.18 -Key Provider Record Extra Field (0x0022):
   1529 
   1530         The following is the layout of the Key Provider "extra" block.
   1531         TData is a variable length, variable content field.  It holds
   1532         information about encryptions and/or encryption key sources.
   1533         Contact PKWARE for information on current TData structures.
   1534         Information in this "extra" block may aternatively be placed
   1535         within comment fields.  Refer to the section in this document 
   1536         entitled "Incorporating PKWARE Proprietary Technology into Your 
   1537         Product" for more information.
   1538 
   1539         Value     Size       Description
   1540         -----     ----       -----------
   1541         0x0022    2 bytes    Tag for this "extra" block type
   1542         TSize     2 bytes    Size for the following data block
   1543         TData     TSize      Data about the key
   1544 
   1545    4.5.19 -Policy Key Data Record Record Extra Field (0x0023):
   1546 
   1547         The following is the layout of the Policy Key Data "extra" block.
   1548         TData is a variable length, variable content field.  It holds
   1549         information about encryptions and/or encryption key sources.
   1550         Contact PKWARE for information on current TData structures.
   1551         Information in this "extra" block may aternatively be placed
   1552         within comment fields.  Refer to the section in this document 
   1553         entitled "Incorporating PKWARE Proprietary Technology into Your 
   1554         Product" for more information.
   1555 
   1556         Value     Size       Description
   1557         -----     ----       -----------
   1558         0x0023    2 bytes    Tag for this "extra" block type
   1559         TSize     2 bytes    Size for the following data block
   1560         TData     TSize      Data about the key
   1561 
   1562 4.6 Third Party Mappings
   1563 ------------------------
   1564                  
   1565    4.6.1 Third party mappings commonly used are:
   1566 
   1567           0x07c8        Macintosh
   1568           0x1986        Pixar USD header ID
   1569           0x2605        ZipIt Macintosh
   1570           0x2705        ZipIt Macintosh 1.3.5+
   1571           0x2805        ZipIt Macintosh 1.3.5+
   1572           0x334d        Info-ZIP Macintosh
   1573           0x4154        Tandem
   1574           0x4341        Acorn/SparkFS 
   1575           0x4453        Windows NT security descriptor (binary ACL)
   1576           0x4704        VM/CMS
   1577           0x470f        MVS
   1578           0x4854        THEOS (old?)
   1579           0x4b46        FWKCS MD5 (see below)
   1580           0x4c41        OS/2 access control list (text ACL)
   1581           0x4d49        Info-ZIP OpenVMS 
   1582           0x4d63        Macintosh Smartzip (??)
   1583           0x4f4c        Xceed original location extra field
   1584           0x5356        AOS/VS (ACL)
   1585           0x5455        extended timestamp
   1586           0x554e        Xceed unicode extra field
   1587           0x5855        Info-ZIP UNIX (original, also OS/2, NT, etc)
   1588           0x6375        Info-ZIP Unicode Comment Extra Field
   1589           0x6542        BeOS/BeBox
   1590           0x6854        THEOS
   1591           0x7075        Info-ZIP Unicode Path Extra Field
   1592           0x7441        AtheOS/Syllable
   1593           0x756e        ASi UNIX
   1594           0x7855        Info-ZIP UNIX (new)
   1595           0x7875        Info-ZIP UNIX (newer UID/GID)
   1596           0xa11e        Data Stream Alignment (Apache Commons-Compress)
   1597           0xa220        Microsoft Open Packaging Growth Hint
   1598           0xcafe        Java JAR file Extra Field Header ID
   1599           0xd935        Android ZIP Alignment Extra Field        
   1600           0xe57a        Korean ZIP code page info
   1601           0xfd4a        SMS/QDOS
   1602           0x9901        AE-x encryption structure (see APPENDIX E)
   1603           0x9902        unknown
   1604 
   1605 
   1606    Detailed descriptions of Extra Fields defined by third 
   1607    party mappings will be documented as information on
   1608    these data structures is made available to PKWARE.  
   1609    PKWARE does not guarantee the accuracy of any published
   1610    third party data.
   1611 
   1612    4.6.2 Third-party Extra Fields MUST include a Header ID using
   1613    the format defined in the section of this document 
   1614    titled Extensible Data Fields (section 4.5).
   1615 
   1616    The Data Size field indicates the size of the following
   1617    data block. Programs can use this value to skip to the
   1618    next header block, passing over any data blocks that are
   1619    not of interest.
   1620 
   1621    Note: As stated above, the size of the entire .ZIP file
   1622          header, including the file name, comment, and extra
   1623          field SHOULD NOT exceed 64K in size.
   1624 
   1625    4.6.3 In case two different programs appropriate the same
   1626    Header ID value, it is strongly recommended that each
   1627    program SHOULD place a unique signature of at least two bytes in
   1628    size (and preferably 4 bytes or bigger) at the start of
   1629    each data area.  Every program SHOULD verify that its
   1630    unique signature is present, in addition to the Header ID
   1631    value being correct, before assuming that it is a block of
   1632    known type.
   1633          
   1634    Third-party Mappings:
   1635    Not all third-party extra field mappings are documented here.
   1636           
   1637    4.6.4 -ZipIt Macintosh Extra Field (long) (0x2605):
   1638 
   1639       The following is the layout of the ZipIt extra block 
   1640       for Macintosh. The local-header and central-header versions 
   1641       are identical. This block MUST be present if the file is 
   1642       stored MacBinary-encoded and it SHOULD NOT be used if the file 
   1643       is not stored MacBinary-encoded.
   1644 
   1645           Value         Size        Description
   1646           -----         ----        -----------
   1647   (Mac2)  0x2605        Short       tag for this extra block type
   1648           TSize         Short       total data size for this block
   1649           "ZPIT"        beLong      extra-field signature
   1650           FnLen         Byte        length of FileName
   1651           FileName      variable    full Macintosh filename
   1652           FileType      Byte[4]     four-byte Mac file type string
   1653           Creator       Byte[4]     four-byte Mac creator string
   1654 
   1655 
   1656    4.6.5 -ZipIt Macintosh Extra Field (short, for files) (0x2705):
   1657 
   1658       The following is the layout of a shortened variant of the
   1659       ZipIt extra block for Macintosh (without "full name" entry).
   1660       This variant is used by ZipIt 1.3.5 and newer for entries of
   1661       files (not directories) that do not have a MacBinary encoded
   1662       file. The local-header and central-header versions are identical.
   1663 
   1664          Value         Size        Description
   1665          -----         ----        -----------
   1666  (Mac2b) 0x2705        Short       tag for this extra block type
   1667          TSize         Short       total data size for this block (12)
   1668          "ZPIT"        beLong      extra-field signature
   1669          FileType      Byte[4]     four-byte Mac file type string
   1670          Creator       Byte[4]     four-byte Mac creator string
   1671          fdFlags       beShort     attributes from FInfo.frFlags,
   1672                                    MAY be omitted
   1673          0x0000        beShort     reserved, MAY be omitted
   1674 
   1675 
   1676    4.6.6 -ZipIt Macintosh Extra Field (short, for directories) (0x2805):
   1677 
   1678       The following is the layout of a shortened variant of the
   1679       ZipIt extra block for Macintosh used only for directory
   1680       entries. This variant is used by ZipIt 1.3.5 and newer to 
   1681       save some optional Mac-specific information about directories.
   1682       The local-header and central-header versions are identical.
   1683 
   1684          Value         Size        Description
   1685          -----         ----        -----------
   1686  (Mac2c) 0x2805        Short       tag for this extra block type
   1687          TSize         Short       total data size for this block (12)
   1688          "ZPIT"        beLong      extra-field signature
   1689          frFlags       beShort     attributes from DInfo.frFlags, MAY
   1690                                    be omitted
   1691          View          beShort     ZipIt view flag, MAY be omitted
   1692 
   1693 
   1694      The View field specifies ZipIt-internal settings as follows:
   1695 
   1696      Bits of the Flags:
   1697         bit 0           if set, the folder is shown expanded (open)
   1698                         when the archive contents are viewed in ZipIt.
   1699         bits 1-15       reserved, zero;
   1700 
   1701 
   1702    4.6.7 -FWKCS MD5 Extra Field (0x4b46):
   1703 
   1704       The FWKCS Contents_Signature System, used in
   1705       automatically identifying files independent of file name,
   1706       optionally adds and uses an extra field to support the
   1707       rapid creation of an enhanced contents_signature:
   1708 
   1709               Header ID = 0x4b46
   1710               Data Size = 0x0013
   1711               Preface   = 'M','D','5'
   1712               followed by 16 bytes containing the uncompressed file's
   1713               128_bit MD5 hash(1), low byte first.
   1714 
   1715       When FWKCS revises a .ZIP file central directory to add
   1716       this extra field for a file, it also replaces the
   1717       central directory entry for that file's uncompressed
   1718       file length with a measured value.
   1719 
   1720       FWKCS provides an option to strip this extra field, if
   1721       present, from a .ZIP file central directory. In adding
   1722       this extra field, FWKCS preserves .ZIP file Authenticity
   1723       Verification; if stripping this extra field, FWKCS
   1724       preserves all versions of AV through PKZIP version 2.04g.
   1725 
   1726       FWKCS, and FWKCS Contents_Signature System, are
   1727       trademarks of Frederick W. Kantor.
   1728 
   1729       (1) R. Rivest, RFC1321.TXT, MIT Laboratory for Computer
   1730           Science and RSA Data Security, Inc., April 1992.
   1731           ll.76-77: "The MD5 algorithm is being placed in the
   1732           public domain for review and possible adoption as a
   1733           standard."
   1734 
   1735 
   1736    4.6.8 -Info-ZIP Unicode Comment Extra Field (0x6375):
   1737 
   1738       Stores the UTF-8 version of the file comment as stored in the
   1739       central directory header. (Last Revision 20070912)
   1740 
   1741          Value         Size        Description
   1742          -----         ----        -----------
   1743   (UCom) 0x6375        Short       tag for this extra block type ("uc")
   1744          TSize         Short       total data size for this block
   1745          Version       1 byte      version of this extra field, currently 1
   1746          ComCRC32      4 bytes     Comment Field CRC32 Checksum
   1747          UnicodeCom    Variable    UTF-8 version of the entry comment
   1748 
   1749        Currently Version is set to the number 1.  If there is a need
   1750        to change this field, the version will be incremented.  Changes
   1751        MAY NOT be backward compatible so this extra field SHOULD NOT be
   1752        used if the version is not recognized.
   1753 
   1754        The ComCRC32 is the standard zip CRC32 checksum of the File Comment
   1755        field in the central directory header.  This is used to verify that
   1756        the comment field has not changed since the Unicode Comment extra field
   1757        was created.  This can happen if a utility changes the File Comment 
   1758        field but does not update the UTF-8 Comment extra field.  If the CRC 
   1759        check fails, this Unicode Comment extra field SHOULD be ignored and 
   1760        the File Comment field in the header SHOULD be used instead.
   1761 
   1762        The UnicodeCom field is the UTF-8 version of the File Comment field
   1763        in the header.  As UnicodeCom is defined to be UTF-8, no UTF-8 byte
   1764        order mark (BOM) is used.  The length of this field is determined by
   1765        subtracting the size of the previous fields from TSize.  If both the
   1766        File Name and Comment fields are UTF-8, the new General Purpose Bit
   1767        Flag, bit 11 (Language encoding flag (EFS)), can be used to indicate
   1768        both the header File Name and Comment fields are UTF-8 and, in this
   1769        case, the Unicode Path and Unicode Comment extra fields are not
   1770        needed and SHOULD NOT be created.  Note that, for backward
   1771        compatibility, bit 11 SHOULD only be used if the native character set
   1772        of the paths and comments being zipped up are already in UTF-8. It is
   1773        expected that the same file comment storage method, either general
   1774        purpose bit 11 or extra fields, be used in both the Local and Central
   1775        Directory Header for a file.
   1776 
   1777 
   1778    4.6.9 -Info-ZIP Unicode Path Extra Field (0x7075):
   1779 
   1780        Stores the UTF-8 version of the file name field as stored in the
   1781        local header and central directory header. (Last Revision 20070912)
   1782 
   1783          Value         Size        Description
   1784          -----         ----        -----------
   1785  (UPath) 0x7075        Short       tag for this extra block type ("up")
   1786          TSize         Short       total data size for this block
   1787          Version       1 byte      version of this extra field, currently 1
   1788          NameCRC32     4 bytes     File Name Field CRC32 Checksum
   1789          UnicodeName   Variable    UTF-8 version of the entry File Name
   1790 
   1791       Currently Version is set to the number 1.  If there is a need
   1792       to change this field, the version will be incremented.  Changes
   1793       MAY NOT be backward compatible so this extra field SHOULD NOT be
   1794       used if the version is not recognized.
   1795 
   1796       The NameCRC32 is the standard zip CRC32 checksum of the File Name
   1797       field in the header.  This is used to verify that the header
   1798       File Name field has not changed since the Unicode Path extra field
   1799       was created.  This can happen if a utility renames the File Name but
   1800       does not update the UTF-8 path extra field.  If the CRC check fails,
   1801       this UTF-8 Path Extra Field SHOULD be ignored and the File Name field
   1802       in the header SHOULD be used instead.
   1803 
   1804       The UnicodeName is the UTF-8 version of the contents of the File Name
   1805       field in the header.  As UnicodeName is defined to be UTF-8, no UTF-8
   1806       byte order mark (BOM) is used.  The length of this field is determined
   1807       by subtracting the size of the previous fields from TSize.  If both
   1808       the File Name and Comment fields are UTF-8, the new General Purpose
   1809       Bit Flag, bit 11 (Language encoding flag (EFS)), can be used to
   1810       indicate that both the header File Name and Comment fields are UTF-8
   1811       and, in this case, the Unicode Path and Unicode Comment extra fields
   1812       are not needed and SHOULD NOT be created.  Note that, for backward
   1813       compatibility, bit 11 SHOULD only be used if the native character set
   1814       of the paths and comments being zipped up are already in UTF-8. It is
   1815       expected that the same file name storage method, either general
   1816       purpose bit 11 or extra fields, be used in both the Local and Central
   1817       Directory Header for a file.
   1818  
   1819 
   1820    4.6.10 -Microsoft Open Packaging Growth Hint (0xa220):
   1821 
   1822           Value         Size        Description
   1823           -----         ----        -----------
   1824           0xa220        Short       tag for this extra block type
   1825           TSize         Short       size of Sig + PadVal + Padding
   1826           Sig           Short       verification signature (A028)
   1827           PadVal        Short       Initial padding value
   1828           Padding       variable    filled with NULL characters
   1829 
   1830     4.6.11 -Data Stream Alignment (Apache Commons-Compress) (0xa11e):
   1831 
   1832        (per Zbynek Vyskovsky) Defines alignment of data stream of this 
   1833        entry within the zip archive.  Additionally, indicates whether the 
   1834        compression method should be kept when re-compressing the zip file.
   1835 
   1836        The purpose of this extra field is to align specific resources to 
   1837        word or page boundaries so they can be easily mapped into memory.  
   1838 
   1839          Value         Size        Description
   1840          -----         ----        -----------
   1841          0xa11e        Short       tag for this extra block type
   1842          TSize         Short       total data size for this block (2+padding)
   1843          alignment     Short       required alignment and indicator
   1844          0x00          Variable    padding
   1845 
   1846        The alignment field (lower 15 bits) defines the minimal alignment 
   1847        required by the data stream.   Bit 15 of alignment field indicates 
   1848        whether the compression method of this entry can be changed when 
   1849        recompressing the zip file.  The value 0 means the compression method 
   1850        should not be changed.  The value 1 indicates  the compression method 
   1851        may be changed. The padding field contains padding to ensure the correct 
   1852        alignment.  It can be changed at any time when the offset or required 
   1853        alignment changes. (see https://issues.apache.org/jira/browse/COMPRESS-391)
   1854 
   1855 
   1856 4.7 Manifest Files
   1857 ------------------
   1858 
   1859     4.7.1 Applications using ZIP files MAY have a need for additional 
   1860     information that MUST be included with the files placed into
   1861     a ZIP file. Application specific information that cannot be
   1862     stored using the defined ZIP storage records SHOULD be stored 
   1863     using the extensible Extra Field convention defined in this 
   1864     document.  However, some applications MAY use a manifest
   1865     file as a means for storing additional information.  One
   1866     example is the META-INF/MANIFEST.MF file used in ZIP formatted
   1867     files having the .JAR extension (JAR files).  
   1868 
   1869     4.7.2 A manifest file is a file created for the application process
   1870     that requires this information.  A manifest file MAY be of any 
   1871     file type required by the defining application process.  It is 
   1872     placed within the same ZIP file as files to which this information 
   1873     applies. By convention, this file is typically the first file placed
   1874     into the ZIP file and it MAY include a defined directory path.
   1875 
   1876     4.7.3 Manifest files MAY be compressed or encrypted as needed for
   1877     application processing of the files inside the ZIP files.
   1878 
   1879     Manifest files are outside of the scope of this specification.
   1880 
   1881 
   1882 5.0 Explanation of compression methods
   1883 --------------------------------------
   1884 
   1885 
   1886 5.1 UnShrinking - Method 1
   1887 --------------------------
   1888 
   1889     5.1.1 Shrinking is a Dynamic Ziv-Lempel-Welch compression algorithm
   1890     with partial clearing.  The initial code size is 9 bits, and the 
   1891     maximum code size is 13 bits.  Shrinking differs from conventional 
   1892     Dynamic Ziv-Lempel-Welch implementations in several respects:
   1893 
   1894     5.1.2 The code size is controlled by the compressor, and is 
   1895     not automatically increased when codes larger than the current
   1896     code size are created (but not necessarily used).  When
   1897     the decompressor encounters the code sequence 256
   1898     (decimal) followed by 1, it SHOULD increase the code size
   1899     read from the input stream to the next bit size.  No
   1900     blocking of the codes is performed, so the next code at
   1901     the increased size SHOULD be read from the input stream
   1902     immediately after where the previous code at the smaller
   1903     bit size was read.  Again, the decompressor SHOULD NOT
   1904     increase the code size used until the sequence 256,1 is
   1905     encountered.
   1906 
   1907     5.1.3 When the table becomes full, total clearing is not
   1908     performed.  Rather, when the compressor emits the code
   1909     sequence 256,2 (decimal), the decompressor SHOULD clear
   1910     all leaf nodes from the Ziv-Lempel tree, and continue to
   1911     use the current code size.  The nodes that are cleared
   1912     from the Ziv-Lempel tree are then re-used, with the lowest
   1913     code value re-used first, and the highest code value
   1914     re-used last.  The compressor can emit the sequence 256,2
   1915     at any time.
   1916 
   1917 5.2 Expanding - Methods 2-5
   1918 ---------------------------
   1919 
   1920     5.2.1 The Reducing algorithm is actually a combination of two
   1921     distinct algorithms.  The first algorithm compresses repeated
   1922     byte sequences, and the second algorithm takes the compressed
   1923     stream from the first algorithm and applies a probabilistic
   1924     compression method.
   1925 
   1926     5.2.2 The probabilistic compression stores an array of 'follower
   1927     sets' S(j), for j=0 to 255, corresponding to each possible
   1928     ASCII character.  Each set contains between 0 and 32
   1929     characters, to be denoted as S(j)[0],...,S(j)[m], where m<32.
   1930     The sets are stored at the beginning of the data area for a
   1931     Reduced file, in reverse order, with S(255) first, and S(0)
   1932     last.
   1933 
   1934     5.2.3 The sets are encoded as { N(j), S(j)[0],...,S(j)[N(j)-1] },
   1935     where N(j) is the size of set S(j).  N(j) can be 0, in which
   1936     case the follower set for S(j) is empty.  Each N(j) value is
   1937     encoded in 6 bits, followed by N(j) eight bit character values
   1938     corresponding to S(j)[0] to S(j)[N(j)-1] respectively.  If
   1939     N(j) is 0, then no values for S(j) are stored, and the value
   1940     for N(j-1) immediately follows.
   1941 
   1942     5.2.4 Immediately after the follower sets, is the compressed data
   1943     stream.  The compressed data stream can be interpreted for the
   1944     probabilistic decompression as follows:
   1945 
   1946     let Last-Character <- 0.
   1947     loop until done
   1948         if the follower set S(Last-Character) is empty then
   1949             read 8 bits from the input stream, and copy this
   1950             value to the output stream.
   1951         otherwise if the follower set S(Last-Character) is non-empty then
   1952             read 1 bit from the input stream.
   1953             if this bit is not zero then
   1954                 read 8 bits from the input stream, and copy this
   1955                 value to the output stream.
   1956             otherwise if this bit is zero then
   1957                 read B(N(Last-Character)) bits from the input
   1958                 stream, and assign this value to I.
   1959                 Copy the value of S(Last-Character)[I] to the
   1960                 output stream.
   1961 
   1962         assign the last value placed on the output stream to
   1963         Last-Character.
   1964     end loop
   1965 
   1966     B(N(j)) is defined as the minimal number of bits required to
   1967     encode the value N(j)-1.
   1968 
   1969     5.2.5 The decompressed stream from above can then be expanded to
   1970     re-create the original file as follows:
   1971 
   1972     let State <- 0.
   1973 
   1974     loop until done
   1975         read 8 bits from the input stream into C.
   1976         case State of
   1977             0:  if C is not equal to DLE (144 decimal) then
   1978                    copy C to the output stream.
   1979                  otherwise if C is equal to DLE then
   1980                    let State <- 1.
   1981 
   1982             1:  if C is non-zero then
   1983                    let V <- C.
   1984                    let Len <- L(V)
   1985                    let State <- F(Len).
   1986                  otherwise if C is zero then
   1987                    copy the value 144 (decimal) to the output stream.
   1988                    let State <- 0
   1989 
   1990             2:  let Len <- Len + C
   1991                     let State <- 3.
   1992 
   1993             3:  move backwards D(V,C) bytes in the output stream
   1994                     (if this position is before the start of the output
   1995                     stream, then assume that all the data before the
   1996                     start of the output stream is filled with zeros).
   1997                     copy Len+3 bytes from this position to the output stream.
   1998                     let State <- 0.
   1999           end case
   2000     end loop
   2001 
   2002     The functions F,L, and D are dependent on the 'compression
   2003     factor', 1 through 4, and are defined as follows:
   2004 
   2005     For compression factor 1:
   2006         L(X) equals the lower 7 bits of X.
   2007         F(X) equals 2 if X equals 127 otherwise F(X) equals 3.
   2008         D(X,Y) equals the (upper 1 bit of X) * 256 + Y + 1.
   2009     For compression factor 2:
   2010         L(X) equals the lower 6 bits of X.
   2011         F(X) equals 2 if X equals 63 otherwise F(X) equals 3.
   2012         D(X,Y) equals the (upper 2 bits of X) * 256 + Y + 1.
   2013     For compression factor 3:
   2014         L(X) equals the lower 5 bits of X.
   2015         F(X) equals 2 if X equals 31 otherwise F(X) equals 3.
   2016         D(X,Y) equals the (upper 3 bits of X) * 256 + Y + 1.
   2017     For compression factor 4:
   2018         L(X) equals the lower 4 bits of X.
   2019         F(X) equals 2 if X equals 15 otherwise F(X) equals 3.
   2020         D(X,Y) equals the (upper 4 bits of X) * 256 + Y + 1.
   2021 
   2022 5.3 Imploding - Method 6
   2023 ------------------------
   2024 
   2025     5.3.1 The Imploding algorithm is actually a combination of two 
   2026     distinct algorithms.  The first algorithm compresses repeated byte
   2027     sequences using a sliding dictionary.  The second algorithm is
   2028     used to compress the encoding of the sliding dictionary output,
   2029     using multiple Shannon-Fano trees.
   2030 
   2031     5.3.2 The Imploding algorithm can use a 4K or 8K sliding dictionary
   2032     size. The dictionary size used can be determined by bit 1 in the
   2033     general purpose flag word; a 0 bit indicates a 4K dictionary
   2034     while a 1 bit indicates an 8K dictionary.
   2035 
   2036     5.3.3 The Shannon-Fano trees are stored at the start of the 
   2037     compressed file. The number of trees stored is defined by bit 2 in 
   2038     the general purpose flag word; a 0 bit indicates two trees stored, 
   2039     a 1 bit indicates three trees are stored.  If 3 trees are stored,
   2040     the first Shannon-Fano tree represents the encoding of the
   2041     Literal characters, the second tree represents the encoding of
   2042     the Length information, the third represents the encoding of the
   2043     Distance information.  When 2 Shannon-Fano trees are stored, the
   2044     Length tree is stored first, followed by the Distance tree.
   2045 
   2046     5.3.4 The Literal Shannon-Fano tree, if present is used to represent
   2047     the entire ASCII character set, and contains 256 values.  This
   2048     tree is used to compress any data not compressed by the sliding
   2049     dictionary algorithm.  When this tree is present, the Minimum
   2050     Match Length for the sliding dictionary is 3.  If this tree is
   2051     not present, the Minimum Match Length is 2.
   2052 
   2053     5.3.5 The Length Shannon-Fano tree is used to compress the Length 
   2054     part of the (length,distance) pairs from the sliding dictionary
   2055     output.  The Length tree contains 64 values, ranging from the
   2056     Minimum Match Length, to 63 plus the Minimum Match Length.
   2057 
   2058     5.3.6 The Distance Shannon-Fano tree is used to compress the Distance
   2059     part of the (length,distance) pairs from the sliding dictionary
   2060     output. The Distance tree contains 64 values, ranging from 0 to
   2061     63, representing the upper 6 bits of the distance value.  The
   2062     distance values themselves will be between 0 and the sliding
   2063     dictionary size, either 4K or 8K.
   2064 
   2065     5.3.7 The Shannon-Fano trees themselves are stored in a compressed
   2066     format. The first byte of the tree data represents the number of
   2067     bytes of data representing the (compressed) Shannon-Fano tree
   2068     minus 1.  The remaining bytes represent the Shannon-Fano tree
   2069     data encoded as:
   2070 
   2071         High 4 bits: Number of values at this bit length + 1. (1 - 16)
   2072         Low  4 bits: Bit Length needed to represent value + 1. (1 - 16)
   2073 
   2074     5.3.8 The Shannon-Fano codes can be constructed from the bit lengths
   2075     using the following algorithm:
   2076 
   2077     1)  Sort the Bit Lengths in ascending order, while retaining the
   2078         order of the original lengths stored in the file.
   2079 
   2080     2)  Generate the Shannon-Fano trees:
   2081 
   2082         Code <- 0
   2083         CodeIncrement <- 0
   2084         LastBitLength <- 0
   2085         i <- number of Shannon-Fano codes - 1   (either 255 or 63)
   2086 
   2087         loop while i >= 0
   2088             Code = Code + CodeIncrement
   2089             if BitLength(i) <> LastBitLength then
   2090                 LastBitLength=BitLength(i)
   2091                 CodeIncrement = 1 shifted left (16 - LastBitLength)
   2092             ShannonCode(i) = Code
   2093             i <- i - 1
   2094         end loop
   2095 
   2096     3)  Reverse the order of all the bits in the above ShannonCode()
   2097         vector, so that the most significant bit becomes the least
   2098         significant bit.  For example, the value 0x1234 (hex) would
   2099         become 0x2C48 (hex).
   2100 
   2101     4)  Restore the order of Shannon-Fano codes as originally stored
   2102         within the file.
   2103 
   2104     Example:
   2105 
   2106         This example will show the encoding of a Shannon-Fano tree
   2107         of size 8.  Notice that the actual Shannon-Fano trees used
   2108         for Imploding are either 64 or 256 entries in size.
   2109 
   2110     Example:   0x02, 0x42, 0x01, 0x13
   2111 
   2112         The first byte indicates 3 values in this table.  Decoding the
   2113         bytes:
   2114                 0x42 = 5 codes of 3 bits long
   2115                 0x01 = 1 code  of 2 bits long
   2116                 0x13 = 2 codes of 4 bits long
   2117 
   2118         This would generate the original bit length array of:
   2119         (3, 3, 3, 3, 3, 2, 4, 4)
   2120 
   2121         There are 8 codes in this table for the values 0 thru 7.  Using 
   2122         the algorithm to obtain the Shannon-Fano codes produces:
   2123 
   2124                                       Reversed     Order     Original
   2125     Val  Sorted   Constructed Code      Value     Restored    Length
   2126     ---  ------   -----------------   --------    --------    ------
   2127     0:     2      1100000000000000        11       101          3
   2128     1:     3      1010000000000000       101       001          3
   2129     2:     3      1000000000000000       001       110          3
   2130     3:     3      0110000000000000       110       010          3
   2131     4:     3      0100000000000000       010       100          3
   2132     5:     3      0010000000000000       100        11          2
   2133     6:     4      0001000000000000      1000      1000          4
   2134     7:     4      0000000000000000      0000      0000          4
   2135 
   2136     The values in the Val, Order Restored and Original Length columns
   2137     now represent the Shannon-Fano encoding tree that can be used for
   2138     decoding the Shannon-Fano encoded data.  How to parse the
   2139     variable length Shannon-Fano values from the data stream is beyond
   2140     the scope of this document.  (See the references listed at the end of
   2141     this document for more information.)  However, traditional decoding
   2142     schemes used for Huffman variable length decoding, such as the
   2143     Greenlaw algorithm, can be successfully applied.
   2144 
   2145     5.3.9 The compressed data stream begins immediately after the
   2146     compressed Shannon-Fano data.  The compressed data stream can be
   2147     interpreted as follows:
   2148 
   2149     loop until done
   2150         read 1 bit from input stream.
   2151 
   2152         if this bit is non-zero then       (encoded data is literal data)
   2153             if Literal Shannon-Fano tree is present
   2154                 read and decode character using Literal Shannon-Fano tree.
   2155             otherwise
   2156                 read 8 bits from input stream.
   2157             copy character to the output stream.
   2158         otherwise              (encoded data is sliding dictionary match)
   2159             if 8K dictionary size
   2160                 read 7 bits for offset Distance (lower 7 bits of offset).
   2161             otherwise
   2162                 read 6 bits for offset Distance (lower 6 bits of offset).
   2163 
   2164             using the Distance Shannon-Fano tree, read and decode the
   2165               upper 6 bits of the Distance value.
   2166 
   2167             using the Length Shannon-Fano tree, read and decode
   2168               the Length value.
   2169 
   2170             Length <- Length + Minimum Match Length
   2171 
   2172             if Length = 63 + Minimum Match Length
   2173                 read 8 bits from the input stream,
   2174                 add this value to Length.
   2175 
   2176             move backwards Distance+1 bytes in the output stream, and
   2177             copy Length characters from this position to the output
   2178             stream.  (if this position is before the start of the output
   2179             stream, then assume that all the data before the start of
   2180             the output stream is filled with zeros).
   2181     end loop
   2182 
   2183 5.4 Tokenizing - Method 7
   2184 -------------------------
   2185 
   2186     5.4.1 This method is not used by PKZIP.
   2187 
   2188 5.5 Deflating - Method 8
   2189 ------------------------
   2190 
   2191     5.5.1 The Deflate algorithm is similar to the Implode algorithm using
   2192     a sliding dictionary of up to 32K with secondary compression
   2193     from Huffman/Shannon-Fano codes.
   2194 
   2195     5.5.2 The compressed data is stored in blocks with a header describing
   2196     the block and the Huffman codes used in the data block.  The header
   2197     format is as follows:
   2198 
   2199        Bit 0: Last Block bit     This bit is set to 1 if this is the last
   2200                                  compressed block in the data.
   2201        Bits 1-2: Block type
   2202           00 (0) - Block is stored - All stored data is byte aligned.
   2203                    Skip bits until next byte, then next word = block 
   2204                    length, followed by the ones compliment of the block
   2205                    length word. Remaining data in block is the stored 
   2206                    data.
   2207 
   2208           01 (1) - Use fixed Huffman codes for literal and distance codes.
   2209                    Lit Code    Bits             Dist Code   Bits
   2210                    ---------   ----             ---------   ----
   2211                      0 - 143    8                 0 - 31      5
   2212                    144 - 255    9
   2213                    256 - 279    7
   2214                    280 - 287    8
   2215 
   2216                    Literal codes 286-287 and distance codes 30-31 are 
   2217                    never used but participate in the huffman construction.
   2218 
   2219           10 (2) - Dynamic Huffman codes.  (See expanding Huffman codes)
   2220 
   2221           11 (3) - Reserved - Flag a "Error in compressed data" if seen.
   2222 
   2223     5.5.3 Expanding Huffman Codes
   2224     
   2225     If the data block is stored with dynamic Huffman codes, the Huffman
   2226     codes are sent in the following compressed format:
   2227 
   2228        5 Bits: # of Literal codes sent - 256 (256 - 286)
   2229                All other codes are never sent.
   2230        5 Bits: # of Dist codes - 1           (1 - 32)
   2231        4 Bits: # of Bit Length codes - 3     (3 - 19)
   2232 
   2233     The Huffman codes are sent as bit lengths and the codes are built as
   2234     described in the implode algorithm.  The bit lengths themselves are
   2235     compressed with Huffman codes.  There are 19 bit length codes:
   2236 
   2237        0 - 15: Represent bit lengths of 0 - 15
   2238            16: Copy the previous bit length 3 - 6 times.
   2239                The next 2 bits indicate repeat length (0 = 3, ... ,3 = 6)
   2240                   Example:  Codes 8, 16 (+2 bits 11), 16 (+2 bits 10) will
   2241                             expand to 12 bit lengths of 8 (1 + 6 + 5)
   2242            17: Repeat a bit length of 0 for 3 - 10 times. (3 bits of length)
   2243            18: Repeat a bit length of 0 for 11 - 138 times (7 bits of length)
   2244 
   2245     The lengths of the bit length codes are sent packed 3 bits per value
   2246     (0 - 7) in the following order:
   2247 
   2248        16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15
   2249 
   2250     The Huffman codes SHOULD be built as described in the Implode algorithm
   2251     except codes are assigned starting at the shortest bit length, i.e. the
   2252     shortest code SHOULD be all 0's rather than all 1's.  Also, codes with
   2253     a bit length of zero do not participate in the tree construction.  The
   2254     codes are then used to decode the bit lengths for the literal and 
   2255     distance tables.
   2256 
   2257     The bit lengths for the literal tables are sent first with the number
   2258     of entries sent described by the 5 bits sent earlier.  There are up
   2259     to 286 literal characters; the first 256 represent the respective 8
   2260     bit character, code 256 represents the End-Of-Block code, the remaining
   2261     29 codes represent copy lengths of 3 thru 258.  There are up to 30
   2262     distance codes representing distances from 1 thru 32k as described
   2263     below.
   2264 
   2265                                  Length Codes
   2266                                  ------------
   2267           Extra             Extra              Extra              Extra
   2268      Code Bits Length  Code Bits Lengths  Code Bits Lengths  Code Bits Length(s)
   2269      ---- ---- ------  ---- ---- -------  ---- ---- -------  ---- ---- ---------
   2270       257   0     3     265   1   11,12    273   3   35-42    281   5  131-162
   2271       258   0     4     266   1   13,14    274   3   43-50    282   5  163-194
   2272       259   0     5     267   1   15,16    275   3   51-58    283   5  195-226
   2273       260   0     6     268   1   17,18    276   3   59-66    284   5  227-257
   2274       261   0     7     269   2   19-22    277   4   67-82    285   0    258
   2275       262   0     8     270   2   23-26    278   4   83-98
   2276       263   0     9     271   2   27-30    279   4   99-114
   2277       264   0    10     272   2   31-34    280   4  115-130
   2278 
   2279                                 Distance Codes
   2280                                 --------------
   2281           Extra           Extra             Extra               Extra
   2282      Code Bits Dist  Code Bits  Dist   Code Bits Distance  Code Bits Distance
   2283      ---- ---- ----  ---- ---- ------  ---- ---- --------  ---- ---- --------
   2284        0   0    1      8   3   17-24    16    7  257-384    24   11  4097-6144
   2285        1   0    2      9   3   25-32    17    7  385-512    25   11  6145-8192
   2286        2   0    3     10   4   33-48    18    8  513-768    26   12  8193-12288
   2287        3   0    4     11   4   49-64    19    8  769-1024   27   12 12289-16384
   2288        4   1   5,6    12   5   65-96    20    9 1025-1536   28   13 16385-24576
   2289        5   1   7,8    13   5   97-128   21    9 1537-2048   29   13 24577-32768
   2290        6   2   9-12   14   6  129-192   22   10 2049-3072
   2291        7   2  13-16   15   6  193-256   23   10 3073-4096
   2292 
   2293     5.5.4 The compressed data stream begins immediately after the
   2294     compressed header data.  The compressed data stream can be
   2295     interpreted as follows:
   2296 
   2297     do
   2298        read header from input stream.
   2299 
   2300        if stored block
   2301           skip bits until byte aligned
   2302           read count and 1's compliment of count
   2303           copy count bytes data block
   2304        otherwise
   2305           loop until end of block code sent
   2306              decode literal character from input stream
   2307              if literal < 256
   2308                 copy character to the output stream
   2309              otherwise
   2310                 if literal = end of block
   2311                    break from loop
   2312                 otherwise
   2313                    decode distance from input stream
   2314 
   2315                    move backwards distance bytes in the output stream, and
   2316                    copy length characters from this position to the output
   2317                    stream.
   2318           end loop
   2319     while not last block
   2320 
   2321     if data descriptor exists
   2322        skip bits until byte aligned
   2323        read crc and sizes
   2324     endif
   2325 
   2326 5.6 Enhanced Deflating - Method 9
   2327 ---------------------------------
   2328 
   2329     5.6.1 The Enhanced Deflating algorithm is similar to Deflate but uses 
   2330     a sliding dictionary of up to 64K. Deflate64(tm) is supported
   2331     by the Deflate extractor. 
   2332 
   2333 5.7 BZIP2 - Method 12
   2334 ---------------------
   2335 
   2336     5.7.1 BZIP2 is an open-source data compression algorithm developed by 
   2337     Julian Seward.  Information and source code for this algorithm
   2338     can be found on the internet.
   2339 
   2340 5.8 LZMA - Method 14 
   2341 ---------------------
   2342 
   2343     5.8.1 LZMA is a block-oriented, general purpose data compression 
   2344     algorithm developed and maintained by Igor Pavlov.  It is a derivative 
   2345     of LZ77 that utilizes Markov chains and a range coder.  Information and 
   2346     source code for this algorithm can be found on the internet.  Consult 
   2347     with the author of this algorithm for information on terms or 
   2348     restrictions on use.
   2349 
   2350     Support for LZMA within the ZIP format is defined as follows:   
   2351 
   2352     5.8.2 The Compression method field within the ZIP Local and Central 
   2353     Header records will be set to the value 14 to indicate data was
   2354     compressed using LZMA. 
   2355 
   2356     5.8.3 The Version needed to extract field within the ZIP Local and 
   2357     Central Header records will be set to 6.3 to indicate the minimum 
   2358     ZIP format version supporting this feature.
   2359 
   2360     5.8.4 File data compressed using the LZMA algorithm MUST be placed 
   2361     immediately following the Local Header for the file.  If a standard 
   2362     ZIP encryption header is required, it will follow the Local Header 
   2363     and will precede the LZMA compressed file data segment.  The location 
   2364     of LZMA compressed data segment within the ZIP format will be as shown:
   2365 
   2366         [local header file 1]
   2367         [encryption header file 1]
   2368         [LZMA compressed data segment for file 1]
   2369         [data descriptor 1]
   2370         [local header file 2]
   2371 
   2372     5.8.5 The encryption header and data descriptor records MAY
   2373     be conditionally present.  The LZMA Compressed Data Segment 
   2374     will consist of an LZMA Properties Header followed by the 
   2375     LZMA Compressed Data as shown:
   2376 
   2377         [LZMA properties header for file 1]
   2378         [LZMA compressed data for file 1]
   2379 
   2380     5.8.6 The LZMA Compressed Data will be stored as provided by the 
   2381     LZMA compression library.  Compressed size, uncompressed size and 
   2382     other file characteristics about the file being compressed MUST be 
   2383     stored in standard ZIP storage format.
   2384 
   2385     5.8.7 The LZMA Properties Header will store specific data required 
   2386     to decompress the LZMA compressed Data.  This data is set by the 
   2387     LZMA compression engine using the function WriteCoderProperties() 
   2388     as documented within the LZMA SDK. 
   2389          
   2390     5.8.8 Storage fields for the property information within the LZMA 
   2391     Properties Header are as follows:
   2392 
   2393          LZMA Version Information 2 bytes
   2394          LZMA Properties Size 2 bytes
   2395          LZMA Properties Data variable, defined by "LZMA Properties Size"
   2396 
   2397        5.8.8.1 LZMA Version Information - this field identifies which version 
   2398        of the LZMA SDK was used to compress a file.  The first byte will 
   2399        store the major version number of the LZMA SDK and the second 
   2400        byte will store the minor number.  
   2401 
   2402        5.8.8.2 LZMA Properties Size - this field defines the size of the 
   2403        remaining property data.  Typically this size SHOULD be determined by 
   2404        the version of the SDK.  This size field is included as a convenience
   2405        and to help avoid any ambiguity arising in the future due
   2406        to changes in this compression algorithm. 
   2407 
   2408        5.8.8.3 LZMA Property Data - this variable sized field records the 
   2409        required values for the decompressor as defined by the LZMA SDK.  
   2410        The data stored in this field SHOULD be obtained using the 
   2411        WriteCoderProperties() in the version of the SDK defined by 
   2412        the "LZMA Version Information" field.  
   2413 
   2414        5.8.8.4 The layout of the "LZMA Properties Data" field is a function of 
   2415        the LZMA compression algorithm.  It is possible that this layout MAY be
   2416        changed by the author over time.  The data layout in version 4.3 of the 
   2417        LZMA SDK defines a 5 byte array that uses 4 bytes to store the dictionary 
   2418        size in little-endian order. This is preceded by a single packed byte as 
   2419        the first element of the array that contains the following fields:
   2420 
   2421          PosStateBits
   2422          LiteralPosStateBits
   2423          LiteralContextBits
   2424 
   2425        Refer to the LZMA documentation for a more detailed explanation of 
   2426        these fields.  
   2427 
   2428     5.8.9 Data compressed with method 14, LZMA, MAY include an end-of-stream
   2429     (EOS) marker ending the compressed data stream.  This marker is not
   2430     required, but its use is highly recommended to facilitate processing
   2431     and implementers SHOULD include the EOS marker whenever possible.
   2432     When the EOS marker is used, general purpose bit 1 MUSY be set.  If
   2433     general purpose bit 1 is not set, the EOS marker is not present.
   2434 
   2435 5.9 WavPack - Method 97
   2436 -----------------------
   2437 
   2438     5.9.1 Information describing the use of compression method 97 is 
   2439     provided by WinZIP International, LLC.  This method relies on the
   2440     open source WavPack audio compression utility developed by David Bryant.  
   2441     Information on WavPack is available at www.wavpack.com.  Please consult 
   2442     with the author of this algorithm for information on terms and 
   2443     restrictions on use.
   2444 
   2445     5.9.2 WavPack data for a file begins immediately after the end of the
   2446     local header data.  This data is the output from WavPack compression
   2447     routines.  Within the ZIP file, the use of WavPack compression is
   2448     indicated by setting the compression method field to a value of 97 
   2449     in both the local header and the central directory header.  The Version 
   2450     needed to extract and version made by fields use the same values as are 
   2451     used for data compressed using the Deflate algorithm.
   2452 
   2453     5.9.3 An implementation note for storing digital sample data when using 
   2454     WavPack compression within ZIP files is that all of the bytes of
   2455     the sample data SHOULD be compressed.  This includes any unused
   2456     bits up to the byte boundary.  An example is a 2 byte sample that
   2457     uses only 12 bits for the sample data with 4 unused bits.  If only
   2458     12 bits are passed as the sample size to the WavPack routines, the 4 
   2459     unused bits will be set to 0 on extraction regardless of their original 
   2460     state.  To avoid this, the full 16 bits of the sample data size
   2461     SHOULD be provided. 
   2462 
   2463 5.10 PPMd - Method 98
   2464 ---------------------
   2465 
   2466     5.10.1 PPMd is a data compression algorithm developed by Dmitry Shkarin
   2467     which includes a carryless rangecoder developed by Dmitry Subbotin.
   2468     This algorithm is based on predictive phrase matching on multiple
   2469     order contexts.  Information and source code for this algorithm
   2470     can be found on the internet. Consult with the author of this
   2471     algorithm for information on terms or restrictions on use.
   2472 
   2473     5.10.2 Support for PPMd within the ZIP format currently is provided only 
   2474     for version I, revision 1 of the algorithm.  Storage requirements
   2475     for using this algorithm are as follows:
   2476 
   2477     5.10.3 Parameters needed to control the algorithm are stored in the two
   2478     bytes immediately preceding the compressed data.  These bytes are
   2479     used to store the following fields:
   2480 
   2481     Model order - sets the maximum model order, default is 8, possible
   2482                   values are from 2 to 16 inclusive
   2483 
   2484     Sub-allocator size - sets the size of sub-allocator in MB, default is 50,
   2485                     possible values are from 1MB to 256MB inclusive
   2486 
   2487     Model restoration method - sets the method used to restart context
   2488                     model at memory insufficiency, values are:
   2489 
   2490                     0 - restarts model from scratch - default
   2491                     1 - cut off model - decreases performance by as much as 2x
   2492                     2 - freeze context tree - not recommended
   2493 
   2494     5.10.4 An example for packing these fields into the 2 byte storage field is
   2495     illustrated below.  These values are stored in Intel low-byte/high-byte
   2496     order.
   2497 
   2498     wPPMd = (Model order - 1) + 
   2499             ((Sub-allocator size - 1) << 4) + 
   2500             (Model restoration method << 12)
   2501 
   2502 
   2503 5.11 AE-x Encryption marker - Method 99
   2504 -------------------------------------------
   2505 
   2506 5.12 JPEG variant - Method 96
   2507 -------------------------------------------
   2508 
   2509 5.13 PKWARE Data Compression Library Imploding -  Method 10
   2510 -----------------------------------------------------------
   2511 
   2512 5.14 Reserved -  Method 11
   2513 -------------------------------------------
   2514 
   2515 5.15 Reserved -  Method 13
   2516 -------------------------------------------
   2517 
   2518 5.16 Reserved -  Method 15
   2519 -------------------------------------------
   2520 
   2521 5.17 IBM z/OS CMPSC Compression - Method 16
   2522 -------------------------------------------
   2523 
   2524 Method 16 utilizes the IBM hardware compression facility available
   2525 on most IBM mainframes.  Hardware compression can significantly 
   2526 increase the speed of data compression.  This method uses a variant 
   2527 of the LZ78 algorithm.  CMPSC hardware compression is performed
   2528 using the COMPRESSION CALL instruction.  
   2529 
   2530 ZIP archives can be created using this method only on mainframes
   2531 supporting the CP instruction.  Extraction MAY occur on any
   2532 platform supporting this compression algorithm.  Use of this 
   2533 algorithm requires creation of a compression dictionary and
   2534 an expansion dictionary.  The expansion dictionary MUST be
   2535 placed into the ZIP archive for use on the system where
   2536 extraction will occur.
   2537 
   2538 Additional information on this compression algorithm and dictionaries
   2539 can be found in the IBM provided document titled IBM ESA/390 Data 
   2540 Compression (SA22-7208-01). Storage requirements for using CMPSC 
   2541 compression are as follows.
   2542 
   2543 The format for the compressed data stream placed into the ZIP
   2544 archive following the Local Header is:
   2545 
   2546     [dictionary header]
   2547     [expansion dictionary]
   2548     [CMPSC compressed data] 
   2549 
   2550 If encryption is used to encrypt a file compressed with CMPSC, these 
   2551 sections MUST be encrypted as a single entity.
   2552 
   2553 The format of the dictionary header is:
   2554 
   2555           Value            Size          Description
   2556           -----            ----          -----------
   2557           Version          1 byte        1
   2558           Flags/Symsize    1 byte        Processing flags and
   2559                                          symbol size
   2560           DictionaryLen    4 bytes       Length of the 
   2561                                          expansion dictionary
   2562 
   2563 Explanation of processing flags and symbol size:
   2564 
   2565 The high 4 bits are used to store the processing flags.  The low
   2566 4 bits represent the size of a symbol, in bits (values range
   2567 from 9-13).  Flag values are defined below.
   2568 
   2569     0x80 - expansion dictionary
   2570     0x40 - expansion dictionary is compressed using Deflate
   2571     0x20 - Reserved
   2572     0x10 - Reserved
   2573 
   2574 
   2575 5.18 Reserved -  Method 17
   2576 -------------------------------------------
   2577 
   2578 5.19 IBM TERSE -  Method 18
   2579 -------------------------------------------
   2580 
   2581 5.20 IBM LZ77 z Architecture -  Method 19
   2582 -----------------------------------------
   2583 
   2584 6.0  Traditional PKWARE Encryption
   2585 ----------------------------------
   2586 
   2587     6.0.1 The following information discusses the decryption steps
   2588     required to support traditional PKWARE encryption.  This
   2589     form of encryption is considered weak by today's standards
   2590     and its use is recommended only for situations with
   2591     low security needs or for compatibility with older .ZIP 
   2592     applications.
   2593 
   2594 6.1 Traditional PKWARE Decryption
   2595 ---------------------------------
   2596 
   2597     6.1.1 PKWARE is grateful to Mr. Roger Schlafly for his expert 
   2598     contribution towards the development of PKWARE's traditional 
   2599     encryption.
   2600 
   2601     6.1.2 PKZIP encrypts the compressed data stream.  Encrypted files 
   2602     MUST be decrypted before they can be extracted to their original
   2603     form.
   2604 
   2605     6.1.3 Each encrypted file has an extra 12 bytes stored at the start 
   2606     of the data area defining the encryption header for that file.  The
   2607     encryption header is originally set to random values, and then
   2608     itself encrypted, using three, 32-bit keys.  The key values are
   2609     initialized using the supplied encryption password.  After each byte
   2610     is encrypted, the keys are then updated using pseudo-random number
   2611     generation techniques in combination with the same CRC-32 algorithm
   2612     used in PKZIP and described elsewhere in this document.
   2613 
   2614     6.1.4 The following are the basic steps required to decrypt a file:
   2615 
   2616     1) Initialize the three 32-bit keys with the password.
   2617     2) Read and decrypt the 12-byte encryption header, further
   2618        initializing the encryption keys.
   2619     3) Read and decrypt the compressed data stream using the
   2620        encryption keys.
   2621 
   2622     6.1.5 Initializing the encryption keys
   2623         
   2624     Key(0) <- 305419896
   2625     Key(1) <- 591751049
   2626     Key(2) <- 878082192
   2627 
   2628     loop for i <- 0 to length(password)-1
   2629         update_keys(password(i))
   2630     end loop
   2631 
   2632     Where update_keys() is defined as:
   2633 
   2634     update_keys(char):
   2635       Key(0) <- crc32(key(0),char)
   2636       Key(1) <- Key(1) + (Key(0) & 000000ffH)
   2637       Key(1) <- Key(1) * 134775813 + 1
   2638       Key(2) <- crc32(key(2),key(1) >> 24)
   2639     end update_keys
   2640 
   2641     Where crc32(old_crc,char) is a routine that given a CRC value and a
   2642     character, returns an updated CRC value after applying the CRC-32
   2643     algorithm described elsewhere in this document.
   2644 
   2645     6.1.6 Decrypting the encryption header
   2646         
   2647     The purpose of this step is to further initialize the encryption
   2648     keys, based on random data, to render a plaintext attack on the
   2649     data ineffective.
   2650 
   2651     Read the 12-byte encryption header into Buffer, in locations
   2652     Buffer(0) thru Buffer(11).
   2653 
   2654     loop for i <- 0 to 11
   2655         C <- buffer(i) ^ decrypt_byte()
   2656         update_keys(C)
   2657         buffer(i) <- C
   2658     end loop
   2659 
   2660     Where decrypt_byte() is defined as:
   2661 
   2662     unsigned char decrypt_byte()
   2663         local unsigned short temp
   2664         temp <- Key(2) | 2
   2665         decrypt_byte <- (temp * (temp ^ 1)) >> 8
   2666     end decrypt_byte
   2667 
   2668     After the header is decrypted,  the last 1 or 2 bytes in Buffer
   2669     SHOULD be the high-order word/byte of the CRC for the file being
   2670     decrypted, stored in Intel low-byte/high-byte order.  Versions of
   2671     PKZIP prior to 2.0 used a 2 byte CRC check; a 1 byte CRC check is
   2672     used on versions after 2.0.  This can be used to test if the password
   2673     supplied is correct or not.
   2674 
   2675     6.1.7 Decrypting the compressed data stream
   2676     
   2677     The compressed data stream can be decrypted as follows:
   2678 
   2679     loop until done
   2680         read a character into C
   2681         Temp <- C ^ decrypt_byte()
   2682         update_keys(temp)
   2683         output Temp
   2684     end loop
   2685 
   2686 
   2687 7.0 Strong Encryption Specification
   2688 -----------------------------------
   2689 
   2690    7.0.1 Portions of the Strong Encryption technology defined in this 
   2691    specification are covered under patents and pending patent applications.
   2692    Refer to the section in this document entitled "Incorporating 
   2693    PKWARE Proprietary Technology into Your Product" for more information.
   2694 
   2695 7.1 Strong Encryption Overview
   2696 ------------------------------
   2697 
   2698    7.1.1 Version 5.x of this specification introduced support for strong 
   2699    encryption algorithms.  These algorithms can be used with either 
   2700    a password or an X.509v3 digital certificate to encrypt each file. 
   2701    This format specification supports either password or certificate 
   2702    based encryption to meet the security needs of today, to enable 
   2703    interoperability between users within both PKI and non-PKI 
   2704    environments, and to ensure interoperability between different 
   2705    computing platforms that are running a ZIP program.  
   2706 
   2707    7.1.2 Password based encryption is the most common form of encryption 
   2708    people are familiar with.  However, inherent weaknesses with 
   2709    passwords (e.g. susceptibility to dictionary/brute force attack) 
   2710    as well as password management and support issues make certificate 
   2711    based encryption a more secure and scalable option.  Industry 
   2712    efforts and support are defining and moving towards more advanced 
   2713    security solutions built around X.509v3 digital certificates and 
   2714    Public Key Infrastructures(PKI) because of the greater scalability, 
   2715    administrative options, and more robust security over traditional 
   2716    password based encryption. 
   2717 
   2718    7.1.3 Most standard encryption algorithms are supported with this
   2719    specification. Reference implementations for many of these 
   2720    algorithms are available from either commercial or open source 
   2721    distributors.  Readily available cryptographic toolkits make
   2722    implementation of the encryption features straight-forward.  
   2723    This document is not intended to provide a treatise on data 
   2724    encryption principles or theory.  Its purpose is to document the 
   2725    data structures required for implementing interoperable data 
   2726    encryption within the .ZIP format.  It is strongly recommended that 
   2727    you have a good understanding of data encryption before reading 
   2728    further.
   2729 
   2730    7.1.4 The algorithms introduced in Version 5.0 of this specification 
   2731    include:
   2732 
   2733       RC2 40 bit, 64 bit, and 128 bit
   2734       RC4 40 bit, 64 bit, and 128 bit
   2735       DES
   2736       3DES 112 bit and 168 bit
   2737   
   2738    Version 5.1 adds support for the following:
   2739 
   2740       AES 128 bit, 192 bit, and 256 bit
   2741 
   2742 
   2743    7.1.5 Version 6.1 introduces encryption data changes to support 
   2744    interoperability with Smartcard and USB Token certificate storage 
   2745    methods which do not support the OAEP strengthening standard.
   2746 
   2747    7.1.6 Version 6.2 introduces support for encrypting metadata by compressing 
   2748    and encrypting the central directory data structure to reduce information 
   2749    leakage.   Information leakage can occur in legacy ZIP applications 
   2750    through exposure of information about a file even though that file is 
   2751    stored encrypted.  The information exposed consists of file 
   2752    characteristics stored within the records and fields defined by this 
   2753    specification.  This includes data such as a file's name, its original 
   2754    size, timestamp and CRC32 value. 
   2755 
   2756    7.1.7 Version 6.3 introduces support for encrypting data using the Blowfish
   2757    and Twofish algorithms.  These are symmetric block ciphers developed 
   2758    by Bruce Schneier.  Blowfish supports using a variable length key from 
   2759    32 to 448 bits.  Block size is 64 bits.  Implementations SHOULD use 16
   2760    rounds and the only mode supported within ZIP files is CBC. Twofish 
   2761    supports key sizes 128, 192 and 256 bits.  Block size is 128 bits.  
   2762    Implementations SHOULD use 16 rounds and the only mode supported within
   2763    ZIP files is CBC.  Information and source code for both Blowfish and 
   2764    Twofish algorithms can be found on the internet.  Consult with the author
   2765    of these algorithms for information on terms or restrictions on use.
   2766 
   2767    7.1.8 Central Directory Encryption provides greater protection against 
   2768    information leakage by encrypting the Central Directory structure and 
   2769    by masking key values that are replicated in the unencrypted Local 
   2770    Header.   ZIP compatible programs that cannot interpret an encrypted 
   2771    Central Directory structure cannot rely on the data in the corresponding 
   2772    Local Header for decompression information.  
   2773 
   2774    7.1.9 Extra Field records that MAY contain information about a file that SHOULD 
   2775    not be exposed SHOULD NOT be stored in the Local Header and SHOULD only 
   2776    be written to the Central Directory where they can be encrypted.  This 
   2777    design currently does not support streaming.  Information in the End of 
   2778    Central Directory record, the Zip64 End of Central Directory Locator, 
   2779    and the Zip64 End of Central Directory records are not encrypted.  Access 
   2780    to view data on files within a ZIP file with an encrypted Central Directory
   2781    requires the appropriate password or private key for decryption prior to 
   2782    viewing any files, or any information about the files, in the archive.  
   2783 
   2784    7.1.10 Older ZIP compatible programs not familiar with the Central Directory 
   2785    Encryption feature will no longer be able to recognize the Central 
   2786    Directory and MAY assume the ZIP file is corrupt.  Programs that 
   2787    attempt streaming access using Local Headers will see invalid 
   2788    information for each file.  Central Directory Encryption need not be 
   2789    used for every ZIP file.  Its use is recommended for greater security.  
   2790    ZIP files not using Central Directory Encryption SHOULD operate as 
   2791    in the past. 
   2792 
   2793    7.1.11 This strong encryption feature specification is intended to provide for 
   2794    scalable, cross-platform encryption needs ranging from simple password
   2795    encryption to authenticated public/private key encryption.  
   2796 
   2797    7.1.12 Encryption provides data confidentiality and privacy.  It is 
   2798    recommended that you combine X.509 digital signing with encryption 
   2799    to add authentication and non-repudiation.
   2800 
   2801 
   2802 7.2 Single Password Symmetric Encryption Method
   2803 -----------------------------------------------
   2804 
   2805    7.2.1 The Single Password Symmetric Encryption Method using strong    
   2806    encryption algorithms operates similarly to the traditional 
   2807    PKWARE encryption defined in this format.  Additional data 
   2808    structures are added to support the processing needs of the 
   2809    strong algorithms.
   2810 
   2811    The Strong Encryption data structures are:
   2812 
   2813    7.2.2 General Purpose Bits - Bits 0 and 6 of the General Purpose bit 
   2814    flag in both local and central header records.  Both bits set 
   2815    indicates strong encryption.  Bit 13, when set indicates the Central
   2816    Directory is encrypted and that selected fields in the Local Header
   2817    are masked to hide their actual value.
   2818 
   2819 
   2820     7.2.3 Extra Field 0x0017 in central header only.
   2821 
   2822     Fields to consider in this record are:
   2823 
   2824        7.2.3.1 Format - the data format identifier for this record.  The only
   2825        value allowed at this time is the integer value 2.
   2826 
   2827        7.2.3.2 AlgId - integer identifier of the encryption algorithm from the
   2828        following range
   2829 
   2830                  0x6601 - DES
   2831                  0x6602 - RC2 (version needed to extract < 5.2)
   2832                  0x6603 - 3DES 168
   2833                  0x6609 - 3DES 112
   2834                  0x660E - AES 128 
   2835                  0x660F - AES 192 
   2836                  0x6610 - AES 256 
   2837                  0x6702 - RC2 (version needed to extract >= 5.2)
   2838                  0x6720 - Blowfish
   2839                  0x6721 - Twofish
   2840                  0x6801 - RC4
   2841                  0xFFFF - Unknown algorithm
   2842 
   2843        7.2.3.3 Bitlen - Explicit bit length of key
   2844 
   2845                  32 - 448 bits
   2846            
   2847        7.2.3.4 Flags - Processing flags needed for decryption
   2848 
   2849                  0x0001 - Password is required to decrypt
   2850                  0x0002 - Certificates only
   2851                  0x0003 - Password or certificate required to decrypt
   2852 
   2853                  Values > 0x0003 reserved for certificate processing
   2854 
   2855 
   2856    7.2.4 Decryption header record preceding compressed file data.
   2857 
   2858                  -Decryption Header:
   2859 
   2860                   Value     Size     Description
   2861                   -----     ----     -----------
   2862                   IVSize    2 bytes  Size of initialization vector (IV)
   2863                   IVData    IVSize   Initialization vector for this file
   2864                   Size      4 bytes  Size of remaining decryption header data
   2865                   Format    2 bytes  Format definition for this record
   2866                   AlgID     2 bytes  Encryption algorithm identifier
   2867                   Bitlen    2 bytes  Bit length of encryption key
   2868                   Flags     2 bytes  Processing flags
   2869                   ErdSize   2 bytes  Size of Encrypted Random Data
   2870                   ErdData   ErdSize  Encrypted Random Data
   2871                   Reserved1 4 bytes  Reserved certificate processing data
   2872                   Reserved2 (var)    Reserved for certificate processing data
   2873                   VSize     2 bytes  Size of password validation data
   2874                   VData     VSize-4  Password validation data
   2875                   VCRC32    4 bytes  Standard ZIP CRC32 of password validation data
   2876 
   2877        7.2.4.1 IVData - The size of the IV SHOULD match the algorithm block size.
   2878        The IVData can be completely random data.  If the size of
   2879        the randomly generated data does not match the block size
   2880        it SHOULD be complemented with zero's or truncated as
   2881        necessary.  If IVSize is 0,then IV = CRC32 + Uncompressed
   2882        File Size (as a 64 bit little-endian, unsigned integer value).
   2883 
   2884        7.2.4.2 Format - the data format identifier for this record.  The only
   2885        value allowed at this time is the integer value 3.
   2886 
   2887        7.2.4.3 AlgId - integer identifier of the encryption algorithm from the
   2888        following range
   2889 
   2890                      0x6601 - DES
   2891                      0x6602 - RC2 (version needed to extract < 5.2)
   2892                      0x6603 - 3DES 168
   2893                      0x6609 - 3DES 112
   2894                      0x660E - AES 128 
   2895                      0x660F - AES 192 
   2896                      0x6610 - AES 256 
   2897                      0x6702 - RC2 (version needed to extract >= 5.2)
   2898                      0x6720 - Blowfish
   2899                      0x6721 - Twofish
   2900                      0x6801 - RC4
   2901                      0xFFFF - Unknown algorithm
   2902 
   2903         7.2.4.4 Bitlen - Explicit bit length of key
   2904 
   2905                      32 - 448 bits
   2906                
   2907         7.2.4.5 Flags - Processing flags needed for decryption
   2908 
   2909                      0x0001 - Password is required to decrypt
   2910                      0x0002 - Certificates only
   2911                      0x0003 - Password or certificate required to decrypt
   2912 
   2913                      Values > 0x0003 reserved for certificate processing
   2914 
   2915         7.2.4.6 ErdData - Encrypted random data is used to store random data that
   2916         is used to generate a file session key for encrypting 
   2917         each file.  SHA1 is used to calculate hash data used to 
   2918         derive keys.  File session keys are derived from a master 
   2919         session key generated from the user-supplied password.
   2920         If the Flags field in the decryption header contains 
   2921         the value 0x4000, then the ErdData field MUST be 
   2922         decrypted using 3DES. If the value 0x4000 is not set,
   2923         then the ErdData field MUST be decrypted using AlgId.
   2924 
   2925 
   2926         7.2.4.7 Reserved1 - Reserved for certificate processing, if value is
   2927         zero, then Reserved2 data is absent.  See the explanation
   2928         under the Certificate Processing Method for details on
   2929         this data structure.
   2930 
   2931         7.2.4.8 Reserved2 - If present, the size of the Reserved2 data structure 
   2932         is located by skipping the first 4 bytes of this field 
   2933         and using the next 2 bytes as the remaining size.  See
   2934         the explanation under the Certificate Processing Method
   2935         for details on this data structure.
   2936 
   2937         7.2.4.9 VSize - This size value will always include the 4 bytes of the
   2938         VCRC32 data and will be greater than 4 bytes.
   2939 
   2940         7.2.4.10 VData - Random data for password validation.  This data is VSize
   2941         in length and VSize MUST be a multiple of the encryption
   2942         block size.  VCRC32 is a checksum value of VData.  
   2943         VData and VCRC32 are stored encrypted and start the
   2944         stream of encrypted data for a file.
   2945 
   2946 
   2947     7.2.5 Useful Tips
   2948 
   2949         7.2.5.1 Strong Encryption is always applied to a file after compression. The
   2950         block oriented algorithms all operate in Cypher Block Chaining (CBC) 
   2951         mode.  The block size used for AES encryption is 16.  All other block
   2952         algorithms use a block size of 8.  Two IDs are defined for RC2 to 
   2953         account for a discrepancy found in the implementation of the RC2
   2954         algorithm in the cryptographic library on Windows XP SP1 and all 
   2955         earlier versions of Windows.  It is recommended that zero length files
   2956         not be encrypted, however programs SHOULD be prepared to extract them
   2957         if they are found within a ZIP file.
   2958 
   2959         7.2.5.2 A pseudo-code representation of the encryption process is as follows:
   2960 
   2961             Password = GetUserPassword()
   2962             MasterSessionKey = DeriveKey(SHA1(Password)) 
   2963             RD = CryptographicStrengthRandomData() 
   2964             For Each File
   2965                IV = CryptographicStrengthRandomData() 
   2966                VData = CryptographicStrengthRandomData()
   2967                VCRC32 = CRC32(VData)
   2968                FileSessionKey = DeriveKey(SHA1(IV + RD) 
   2969                ErdData = Encrypt(RD,MasterSessionKey,IV) 
   2970                Encrypt(VData + VCRC32 + FileData, FileSessionKey,IV)
   2971             Done
   2972 
   2973         7.2.5.3 The function names and parameter requirements will depend on
   2974         the choice of the cryptographic toolkit selected.  Almost any
   2975         toolkit supporting the reference implementations for each
   2976         algorithm can be used.  The RSA BSAFE(r), OpenSSL, and Microsoft
   2977         CryptoAPI libraries are all known to work well.  
   2978 
   2979 
   2980  7.3 Single Password - Central Directory Encryption
   2981  --------------------------------------------------
   2982         
   2983     7.3.1 Central Directory Encryption is achieved within the .ZIP format by 
   2984     encrypting the Central Directory structure.  This encapsulates the metadata 
   2985     most often used for processing .ZIP files.  Additional metadata is stored for 
   2986     redundancy in the Local Header for each file.  The process of concealing 
   2987     metadata by encrypting the Central Directory does not protect the data within 
   2988     the Local Header.  To avoid information leakage from the exposed metadata 
   2989     in the Local Header, the fields containing information about a file are masked.  
   2990 
   2991     7.3.2 Local Header
   2992 
   2993     Masking replaces the true content of the fields for a file in the Local 
   2994     Header with false information.  When masked, the Local Header is not 
   2995     suitable for streaming access and the options for data recovery of damaged
   2996     archives is reduced.  Extra Data fields that MAY contain confidential
   2997     data SHOULD NOT be stored within the Local Header.  The value set into
   2998     the Version needed to extract field SHOULD be the correct value needed to
   2999     extract the file without regard to Central Directory Encryption. The fields 
   3000     within the Local Header targeted for masking when the Central Directory is 
   3001     encrypted are:
   3002 
   3003             Field Name                     Mask Value
   3004             ------------------             ---------------------------
   3005             compression method              0
   3006             last mod file time              0
   3007             last mod file date              0
   3008             crc-32                          0
   3009             compressed size                 0
   3010             uncompressed size               0
   3011             file name (variable size)       Base 16 value from the
   3012                                             range 1 - 0xFFFFFFFFFFFFFFFF
   3013                                             represented as a string whose
   3014                                             size will be set into the
   3015                                             file name length field
   3016 
   3017     The Base 16 value assigned as a masked file name is simply a sequentially
   3018     incremented value for each file starting with 1 for the first file.  
   3019     Modifications to a ZIP file MAY cause different values to be stored for 
   3020     each file.  For compatibility, the file name field in the Local Header 
   3021     SHOULD NOT be left blank.  As of Version 6.2 of this specification, 
   3022     the Compression Method and Compressed Size fields are not yet masked.
   3023     Fields having a value of 0xFFFF or 0xFFFFFFFF for the ZIP64 format
   3024     SHOULD NOT be masked.  
   3025 
   3026     7.3.3 Encrypting the Central Directory
   3027 
   3028     Encryption of the Central Directory does not include encryption of the 
   3029     Central Directory Signature data, the Zip64 End of Central Directory
   3030     record, the Zip64 End of Central Directory Locator, or the End
   3031     of Central Directory record.  The ZIP file comment data is never
   3032     encrypted.
   3033 
   3034     Before encrypting the Central Directory, it MAY optionally be compressed.
   3035     Compression is not required, but for storage efficiency it is assumed
   3036     this structure will be compressed before encrypting.  Similarly, this 
   3037     specification supports compressing the Central Directory without
   3038     requiring that it also be encrypted.  Early implementations of this
   3039     feature will assume the encryption method applied to files matches the 
   3040     encryption applied to the Central Directory.
   3041 
   3042     Encryption of the Central Directory is done in a manner similar to
   3043     that of file encryption.  The encrypted data is preceded by a 
   3044     decryption header.  The decryption header is known as the Archive
   3045     Decryption Header.  The fields of this record are identical to
   3046     the decryption header preceding each encrypted file.  The location
   3047     of the Archive Decryption Header is determined by the value in the
   3048     Start of the Central Directory field in the Zip64 End of Central
   3049     Directory record.  When the Central Directory is encrypted, the
   3050     Zip64 End of Central Directory record will always be present.
   3051 
   3052     The layout of the Zip64 End of Central Directory record for all
   3053     versions starting with 6.2 of this specification will follow the
   3054     Version 2 format.  The Version 2 format is as follows:
   3055 
   3056     The leading fixed size fields within the Version 1 format for this
   3057     record remain unchanged.  The record signature for both Version 1 
   3058     and Version 2 will be 0x06064b50.  Immediately following the last
   3059     byte of the field known as the Offset of Start of Central 
   3060     Directory With Respect to the Starting Disk Number will begin the 
   3061     new fields defining Version 2 of this record.  
   3062 
   3063     7.3.4 New fields for Version 2
   3064 
   3065     Note: all fields stored in Intel low-byte/high-byte order.
   3066 
   3067               Value                 Size       Description
   3068               -----                 ----       -----------
   3069               Compression Method    2 bytes    Method used to compress the
   3070                                                Central Directory
   3071               Compressed Size       8 bytes    Size of the compressed data
   3072               Original   Size       8 bytes    Original uncompressed size
   3073               AlgId                 2 bytes    Encryption algorithm ID
   3074               BitLen                2 bytes    Encryption key length
   3075               Flags                 2 bytes    Encryption flags
   3076               HashID                2 bytes    Hash algorithm identifier
   3077               Hash Length           2 bytes    Length of hash data
   3078               Hash Data             (variable) Hash data
   3079 
   3080      The Compression Method accepts the same range of values as the 
   3081      corresponding field in the Central Header.
   3082 
   3083      The Compressed Size and Original Size values will not include the
   3084      data of the Central Directory Signature which is compressed or
   3085      encrypted.
   3086 
   3087      The AlgId, BitLen, and Flags fields accept the same range of values
   3088      the corresponding fields within the 0x0017 record. 
   3089 
   3090      Hash ID identifies the algorithm used to hash the Central Directory 
   3091      data.  This data does not have to be hashed, in which case the
   3092      values for both the HashID and Hash Length will be 0.  Possible 
   3093      values for HashID are:
   3094 
   3095               Value         Algorithm
   3096              ------         ---------
   3097              0x0000          none
   3098              0x0001          CRC32
   3099              0x8003          MD5
   3100              0x8004          SHA1
   3101              0x8007          RIPEMD160
   3102              0x800C          SHA256
   3103              0x800D          SHA384
   3104              0x800E          SHA512
   3105 
   3106      7.3.5 When the Central Directory data is signed, the same hash algorithm
   3107      used to hash the Central Directory for signing SHOULD be used.
   3108      This is recommended for processing efficiency, however, it is 
   3109      permissible for any of the above algorithms to be used independent 
   3110      of the signing process.
   3111 
   3112      The Hash Data will contain the hash data for the Central Directory.
   3113      The length of this data will vary depending on the algorithm used.
   3114 
   3115      The Version Needed to Extract SHOULD be set to 62.
   3116 
   3117      The value for the Total Number of Entries on the Current Disk will
   3118      be 0.  These records will no longer support random access when
   3119      encrypting the Central Directory.
   3120 
   3121      7.3.6 When the Central Directory is compressed and/or encrypted, the
   3122      End of Central Directory record will store the value 0xFFFFFFFF
   3123      as the value for the Total Number of Entries in the Central
   3124      Directory.  The value stored in the Total Number of Entries in
   3125      the Central Directory on this Disk field will be 0.  The actual
   3126      values will be stored in the equivalent fields of the Zip64
   3127      End of Central Directory record.
   3128 
   3129      7.3.7 Decrypting and decompressing the Central Directory is accomplished
   3130      in the same manner as decrypting and decompressing a file.
   3131 
   3132  7.4 Certificate Processing Method
   3133  ---------------------------------
   3134 
   3135     The Certificate Processing Method for ZIP file encryption 
   3136     defines the following additional data fields:
   3137 
   3138     7.4.1 Certificate Flag Values
   3139 
   3140     Additional processing flags that can be present in the Flags field of both 
   3141     the 0x0017 field of the central directory Extra Field and the Decryption 
   3142     header record preceding compressed file data are:
   3143 
   3144          0x0007 - reserved for future use
   3145          0x000F - reserved for future use
   3146          0x0100 - Indicates non-OAEP key wrapping was used.  If this
   3147                   this field is set, the version needed to extract MUST
   3148                   be at least 61.  This means OAEP key wrapping is not
   3149                   used when generating a Master Session Key using
   3150                   ErdData.
   3151          0x4000 - ErdData MUST be decrypted using 3DES-168, otherwise use the
   3152                   same algorithm used for encrypting the file contents.
   3153          0x8000 - reserved for future use
   3154 
   3155 
   3156     7.4.2 CertData - Extra Field 0x0017 record certificate data structure
   3157 
   3158     The data structure used to store certificate data within the section
   3159     of the Extra Field defined by the CertData field of the 0x0017
   3160     record are as shown:
   3161 
   3162           Value     Size     Description
   3163           -----     ----     -----------
   3164           RCount    4 bytes  Number of recipients.  
   3165           HashAlg   2 bytes  Hash algorithm identifier
   3166           HSize     2 bytes  Hash size
   3167           SRList    (var)    Simple list of recipients hashed public keys
   3168 
   3169                           
   3170          RCount    This defines the number intended recipients whose 
   3171                    public keys were used for encryption.  This identifies
   3172                    the number of elements in the SRList.
   3173 
   3174          HashAlg   This defines the hash algorithm used to calculate
   3175                    the public key hash of each public key used
   3176                    for encryption. This field currently supports
   3177                    only the following value for SHA-1
   3178 
   3179                    0x8004 - SHA1
   3180 
   3181          HSize     This defines the size of a hashed public key.
   3182 
   3183          SRList    This is a variable length list of the hashed 
   3184                    public keys for each intended recipient.  Each 
   3185                    element in this list is HSize.  The total size of 
   3186                    SRList is determined using RCount * HSize.
   3187 
   3188 
   3189     7.4.3 Reserved1 - Certificate Decryption Header Reserved1 Data
   3190 
   3191           Value     Size     Description
   3192           -----     ----     -----------
   3193           RCount    4 bytes  Number of recipients.  
   3194                       
   3195           RCount   This defines the number intended recipients whose 
   3196                    public keys were used for encryption.  This defines
   3197                    the number of elements in the REList field defined below.
   3198 
   3199 
   3200     7.4.4 Reserved2 - Certificate Decryption Header Reserved2 Data Structures
   3201 
   3202 
   3203           Value     Size     Description
   3204           -----     ----     -----------
   3205           HashAlg   2 bytes  Hash algorithm identifier
   3206           HSize     2 bytes  Hash size
   3207           REList    (var)    List of recipient data elements
   3208 
   3209 
   3210          HashAlg   This defines the hash algorithm used to calculate
   3211                    the public key hash of each public key used
   3212                    for encryption. This field currently supports
   3213                    only the following value for SHA-1
   3214 
   3215                        0x8004 - SHA1
   3216 
   3217          HSize     This defines the size of a hashed public key
   3218                    defined in REHData.
   3219 
   3220          REList    This is a variable length of list of recipient data.  
   3221                    Each element in this list consists of a Recipient
   3222                    Element data structure as follows:
   3223 
   3224 
   3225         Recipient Element (REList) Data Structure:
   3226 
   3227               Value     Size     Description
   3228               -----     ----     -----------
   3229               RESize    2 bytes  Size of REHData + REKData
   3230               REHData   HSize    Hash of recipients public key
   3231               REKData   (var)    Simple key blob
   3232 
   3233 
   3234              RESize    This defines the size of an individual REList 
   3235                        element.  This value is the combined size of the
   3236                        REHData field + REKData field.  REHData is defined by
   3237                        HSize.  REKData is variable and can be calculated
   3238                        for each REList element using RESize and HSize.
   3239 
   3240              REHData   Hashed public key for this recipient.
   3241 
   3242              REKData   Simple Key Blob.  The format of this data structure
   3243                        is identical to that defined in the Microsoft
   3244                        CryptoAPI and generated using the CryptExportKey()
   3245                        function.  The version of the Simple Key Blob
   3246                        supported at this time is 0x02 as defined by
   3247                        Microsoft.
   3248 
   3249 7.5 Certificate Processing - Central Directory Encryption
   3250 ---------------------------------------------------------
   3251         
   3252     7.5.1 Central Directory Encryption using Digital Certificates will 
   3253     operate in a manner similar to that of Single Password Central
   3254     Directory Encryption.  This record will only be present when there 
   3255     is data to place into it.  Currently, data is placed into this
   3256     record when digital certificates are used for either encrypting 
   3257     or signing the files within a ZIP file.  When only password 
   3258     encryption is used with no certificate encryption or digital 
   3259     signing, this record is not currently needed. When present, this 
   3260     record will appear before the start of the actual Central Directory 
   3261     data structure and will be located immediately after the Archive 
   3262     Decryption Header if the Central Directory is encrypted.
   3263 
   3264     7.5.2 The Archive Extra Data record will be used to store the following
   3265     information.  Additional data MAY be added in future versions.
   3266 
   3267     Extra Data Fields:
   3268 
   3269     0x0014 - PKCS#7 Store for X.509 Certificates
   3270     0x0016 - X.509 Certificate ID and Signature for central directory
   3271     0x0019 - PKCS#7 Encryption Recipient Certificate List
   3272 
   3273     The 0x0014 and 0x0016 Extra Data records that otherwise would be 
   3274     located in the first record of the Central Directory for digital 
   3275     certificate processing. When encrypting or compressing the Central 
   3276     Directory, the 0x0014 and 0x0016 records MUST be located in the 
   3277     Archive Extra Data record and they SHOULD NOT remain in the first 
   3278     Central Directory record.  The Archive Extra Data record will also 
   3279     be used to store the 0x0019 data. 
   3280 
   3281     7.5.3 When present, the size of the Archive Extra Data record will be
   3282     included in the size of the Central Directory.  The data of the
   3283     Archive Extra Data record will also be compressed and encrypted
   3284     along with the Central Directory data structure.
   3285 
   3286 7.6 Certificate Processing Differences
   3287 --------------------------------------
   3288 
   3289     7.6.1 The Certificate Processing Method of encryption differs from the
   3290     Single Password Symmetric Encryption Method as follows.  Instead
   3291     of using a user-defined password to generate a master session key,
   3292     cryptographically random data is used.  The key material is then
   3293     wrapped using standard key-wrapping techniques.  This key material
   3294     is wrapped using the public key of each recipient that will need
   3295     to decrypt the file using their corresponding private key.
   3296 
   3297     7.6.2 This specification currently assumes digital certificates will follow
   3298     the X.509 V3 format for 1024 bit and higher RSA format digital
   3299     certificates.  Implementation of this Certificate Processing Method
   3300     requires supporting logic for key access and management.  This logic
   3301     is outside the scope of this specification.
   3302 
   3303 7.7 OAEP Processing with Certificate-based Encryption
   3304 -----------------------------------------------------
   3305 
   3306     7.7.1 OAEP stands for Optimal Asymmetric Encryption Padding.  It is a
   3307     strengthening technique used for small encoded items such as decryption
   3308     keys.  This is commonly applied in cryptographic key-wrapping techniques
   3309     and is supported by PKCS #1.  Versions 5.0 and 6.0 of this specification 
   3310     were designed to support OAEP key-wrapping for certificate-based 
   3311     decryption keys for additional security.  
   3312 
   3313     7.7.2 Support for private keys stored on Smartcards or Tokens introduced
   3314     a conflict with this OAEP logic.  Most card and token products do 
   3315     not support the additional strengthening applied to OAEP key-wrapped 
   3316     data.  In order to resolve this conflict, versions 6.1 and above of this 
   3317     specification will no longer support OAEP when encrypting using 
   3318     digital certificates. 
   3319 
   3320     7.7.3 Versions of PKZIP available during initial development of the 
   3321     certificate processing method set a value of 61 into the 
   3322     version needed to extract field for a file.  This indicates that 
   3323     non-OAEP key wrapping is used.  This affects certificate encryption 
   3324     only, and password encryption functions SHOULD NOT be affected by 
   3325     this value.  This means values of 61 MAY be found on files encrypted
   3326     with certificates only, or on files encrypted with both password
   3327     encryption and certificate encryption.  Files encrypted with both
   3328     methods can safely be decrypted using the password methods documented.
   3329 
   3330 7.8 Additional Encryption/Decryption Data Records
   3331 -----------------------------------------------------
   3332 
   3333     7.8.1 Additional information MAY be stored within a ZIP file in support
   3334     of the strong password and certificate encryption methods defined above.
   3335     These include, but are not limited to the following record types.
   3336 
   3337       0x0021        Policy Decryption Key Record
   3338       0x0022        Smartcrypt Key Provider Record
   3339       0x0023        Smartcrypt Policy Key Data Record
   3340 
   3341 8.0  Splitting and Spanning ZIP files
   3342 -------------------------------------
   3343 
   3344     8.1 Spanned ZIP files
   3345 
   3346       8.1.1 Spanning is the process of segmenting a ZIP file across 
   3347       multiple removable media. This support has typically only 
   3348       been provided for DOS formatted floppy diskettes. 
   3349 
   3350     8.2 Split ZIP files
   3351 
   3352       8.2.1 File splitting is a newer derivation of spanning.  
   3353       Splitting follows the same segmentation process as
   3354       spanning, however, it does not require writing each
   3355       segment to a unique removable medium and instead supports
   3356       placing all pieces onto local or non-removable locations
   3357       such as file systems, local drives, folders, etc.
   3358 
   3359     8.3  File Naming Differences
   3360 
   3361       8.3.1 A key difference between spanned and split ZIP files is
   3362       that all pieces of a spanned ZIP file have the same name.  
   3363       Since each piece is written to a separate volume, no name 
   3364       collisions occur and each segment can reuse the original 
   3365       .ZIP file name given to the archive.
   3366 
   3367       8.3.2 Sequence ordering for DOS spanned archives uses the DOS 
   3368       volume label to determine segment numbers.  Volume labels
   3369       for each segment are written using the form PKBACK#xxx, 
   3370       where xxx is the segment number written as a decimal 
   3371       value from 001 - nnn.
   3372 
   3373       8.3.3 Split ZIP files are typically written to the same location
   3374       and are subject to name collisions if the spanned name
   3375       format is used since each segment will reside on the same 
   3376       drive. To avoid name collisions, split archives are named 
   3377       as follows.
   3378 
   3379       Segment 1   = filename.z01
   3380       Segment n-1 = filename.z(n-1)
   3381       Segment n   = filename.zip
   3382 
   3383       8.3.4 The .ZIP extension is used on the last segment to support
   3384       quickly reading the central directory.  The segment number
   3385       n SHOULD be a decimal value.
   3386         
   3387     8.4  Spanned Self-extracting ZIP Files
   3388         
   3389       8.4.1 Spanned ZIP files MAY be PKSFX Self-extracting ZIP files.
   3390       PKSFX files MAY also be split, however, in this case
   3391       the first segment MUST be named filename.exe.  The first
   3392       segment of a split PKSFX archive MUST be large enough to
   3393       include the entire executable program.
   3394 
   3395     8.5  Capacities and Markers
   3396         
   3397       8.5.1 Capacities for split archives are as follows:
   3398 
   3399       Maximum number of segments = 4,294,967,295 - 1
   3400       Maximum .ZIP segment size = 4,294,967,295 bytes 
   3401       Minimum segment size = 64K
   3402       Maximum PKSFX segment size = 2,147,483,647 bytes
   3403           
   3404       8.5.2 Segment sizes MAY be different however by convention, all 
   3405       segment sizes SHOULD be the same with the exception of the 
   3406       last, which MAY be smaller.  Local and central directory 
   3407       header records MUST NOT be split across a segment boundary. 
   3408       When writing a header record, if the number of bytes remaining 
   3409       within a segment is less than the size of the header record,
   3410       end the current segment and write the header at the start
   3411       of the next segment.  The central directory MAY span segment
   3412       boundaries, but no single record in the central directory
   3413       SHOULD be split across segments.
   3414 
   3415       8.5.3 Spanned/Split archives created using PKZIP for Windows
   3416       (V2.50 or greater), PKZIP Command Line (V2.50 or greater),
   3417       or PKZIP Explorer will include a special spanning 
   3418       signature as the first 4 bytes of the first segment of
   3419       the archive.  This signature (0x08074b50) will be 
   3420       followed immediately by the local header signature for
   3421       the first file in the archive.  
   3422 
   3423       8.5.4 A special spanning marker MAY also appear in spanned/split 
   3424       archives if the spanning or splitting process starts but 
   3425       only requires one segment.  In this case the 0x08074b50 
   3426       signature will be replaced with the temporary spanning 
   3427       marker signature of 0x30304b50.  Split archives can
   3428       only be uncompressed by other versions of PKZIP that
   3429       know how to create a split archive.
   3430 
   3431       8.5.5 The signature value 0x08074b50 is also used by some
   3432       ZIP implementations as a marker for the Data Descriptor 
   3433       record.  Conflict in this alternate assignment can be
   3434       avoided by ensuring the position of the signature
   3435       within the ZIP file to determine the use for which it
   3436       is intended.  
   3437 
   3438 9.0 Change Process
   3439 ------------------
   3440 
   3441    9.1 In order for the .ZIP file format to remain a viable technology, this
   3442    specification SHOULD be considered as open for periodic review and
   3443    revision.  Although this format was originally designed with a 
   3444    certain level of extensibility, not all changes in technology
   3445    (present or future) were or will be necessarily considered in its
   3446    design.  
   3447 
   3448    9.2 If your application requires new definitions to the
   3449    extensible sections in this format, or if you would like to 
   3450    submit new data structures or new capabilities, please forward 
   3451    your request to zipformat@pkware.com.  All submissions will be 
   3452    reviewed by the ZIP File Specification Committee for possible 
   3453    inclusion into future versions of this specification.  
   3454 
   3455    9.3 Periodic revisions to this specification will be published as
   3456    DRAFT or as FINAL status to ensure interoperability. We encourage 
   3457    comments and feedback that MAY help improve clarity or content.
   3458 
   3459 
   3460 10.0 Incorporating PKWARE Proprietary Technology into Your Product
   3461 ------------------------------------------------------------------
   3462 
   3463    10.1 The Use or Implementation in a product of APPNOTE technological 
   3464    components pertaining to either strong encryption or patching requires 
   3465    a separate, executed license agreement from PKWARE. Please contact 
   3466    PKWARE at zipformat@pkware.com or +1-414-289-9788 with regard to 
   3467    acquiring such a license.
   3468 
   3469    10.2 Additional information regarding PKWARE proprietary technology is 
   3470    available at http://www.pkware.com/appnote.
   3471 
   3472 11.0 Acknowledgements
   3473 ---------------------
   3474 
   3475    In addition to the above mentioned contributors to PKZIP and PKUNZIP,
   3476    PKWARE would like to extend special thanks to Robert Mahoney for 
   3477    suggesting the extension .ZIP for this software.
   3478 
   3479 12.0 References
   3480 ---------------
   3481 
   3482    Fiala, Edward R., and Greene, Daniel H., "Data compression with
   3483       finite windows",  Communications of the ACM, Volume 32, Number 4,
   3484       April 1989, pages 490-505.
   3485 
   3486    Held, Gilbert, "Data Compression, Techniques and Applications,
   3487       Hardware and Software Considerations", John Wiley & Sons, 1987.
   3488 
   3489    Huffman, D.A., "A method for the construction of minimum-redundancy
   3490       codes", Proceedings of the IRE, Volume 40, Number 9, September 1952,
   3491       pages 1098-1101.
   3492 
   3493    Nelson, Mark, "LZW Data Compression", Dr. Dobbs Journal, Volume 14,
   3494       Number 10, October 1989, pages 29-37.
   3495 
   3496    Nelson, Mark, "The Data Compression Book",  M&T Books, 1991.
   3497 
   3498    Storer, James A., "Data Compression, Methods and Theory",
   3499       Computer Science Press, 1988
   3500 
   3501    Welch, Terry, "A Technique for High-Performance Data Compression",
   3502       IEEE Computer, Volume 17, Number 6, June 1984, pages 8-19.
   3503 
   3504    Ziv, J. and Lempel, A., "A universal algorithm for sequential data
   3505       compression", Communications of the ACM, Volume 30, Number 6,
   3506        June 1987, pages 520-540.
   3507 
   3508    Ziv, J. and Lempel, A., "Compression of individual sequences via
   3509       variable-rate coding", IEEE Transactions on Information Theory,
   3510       Volume 24, Number 5, September 1978, pages 530-536.
   3511 
   3512 
   3513 APPENDIX A - AS/400 Extra Field (0x0065) Attribute Definitions
   3514 --------------------------------------------------------------
   3515 
   3516 A.1 Field Definition Structure:
   3517 
   3518    a. field length including length             2 bytes Big Endian
   3519    b. field code                                2 bytes
   3520    c. data                                      x bytes
   3521 
   3522 A.2 Field Code  Description
   3523 
   3524    4001     Source type i.e. CLP etc
   3525    4002     The text description of the library 
   3526    4003     The text description of the file
   3527    4004     The text description of the member
   3528    4005     x'F0' or 0 is PF-DTA,  x'F1' or 1 is PF_SRC
   3529    4007     Database Type Code                  1 byte
   3530    4008     Database file and fields definition
   3531    4009     GZIP file type                      2 bytes
   3532    400B     IFS code page                       2 bytes
   3533    400C     IFS Time of last file status change 4 bytes
   3534    400D     IFS Access Time                     4 bytes
   3535    400E     IFS Modification time               4 bytes
   3536    005C     Length of the records in the file   2 bytes
   3537    0068     GZIP two words                      8 bytes
   3538 
   3539 APPENDIX B - z/OS Extra Field (0x0065) Attribute Definitions
   3540 ------------------------------------------------------------
   3541 
   3542 B.1 Field Definition Structure:
   3543 
   3544    a. field length including length             2 bytes Big Endian
   3545    b. field code                                2 bytes
   3546    c. data                                      x bytes
   3547 
   3548 B.2 Field Code  Description
   3549 
   3550    0001     File Type                           2 bytes 
   3551    0002     NonVSAM Record Format               1 byte
   3552    0003     Reserved                
   3553    0004     NonVSAM Block Size                  2 bytes Big Endian
   3554    0005     Primary Space Allocation            3 bytes Big Endian
   3555    0006     Secondary Space Allocation          3 bytes Big Endian
   3556    0007     Space Allocation Type1 byte flag                
   3557    0008     Modification Date                   Retired with PKZIP 5.0 +
   3558    0009     Expiration Date                     Retired with PKZIP 5.0 +
   3559    000A     PDS Directory Block Allocation      3 bytes Big Endian binary value
   3560    000B     NonVSAM Volume List                 variable                
   3561    000C     UNIT Reference                      Retired with PKZIP 5.0 +
   3562    000D     DF/SMS Management Class             8 bytes EBCDIC Text Value
   3563    000E     DF/SMS Storage Class                8 bytes EBCDIC Text Value
   3564    000F     DF/SMS Data Class                   8 bytes EBCDIC Text Value
   3565    0010     PDS/PDSE Member Info.               30 bytes        
   3566    0011     VSAM sub-filetype                   2 bytes                
   3567    0012     VSAM LRECL                          13 bytes EBCDIC "(num_avg num_max)"
   3568    0013     VSAM Cluster Name                   Retired with PKZIP 5.0 +
   3569    0014     VSAM KSDS Key Information           13 bytes EBCDIC "(num_length num_position)"
   3570    0015     VSAM Average LRECL                  5 bytes EBCDIC num_value padded with blanks
   3571    0016     VSAM Maximum LRECL                  5 bytes EBCDIC num_value padded with blanks
   3572    0017     VSAM KSDS Key Length                5 bytes EBCDIC num_value padded with blanks
   3573    0018     VSAM KSDS Key Position              5 bytes EBCDIC num_value padded with blanks
   3574    0019     VSAM Data Name                      1-44 bytes EBCDIC text string
   3575    001A     VSAM KSDS Index Name                1-44 bytes EBCDIC text string
   3576    001B     VSAM Catalog Name                   1-44 bytes EBCDIC text string
   3577    001C     VSAM Data Space Type                9 bytes EBCDIC text string
   3578    001D     VSAM Data Space Primary             9 bytes EBCDIC num_value left-justified
   3579    001E     VSAM Data Space Secondary           9 bytes EBCDIC num_value left-justified
   3580    001F     VSAM Data Volume List               variable EBCDIC text list of 6-character Volume IDs
   3581    0020     VSAM Data Buffer Space              8 bytes EBCDIC num_value left-justified
   3582    0021     VSAM Data CISIZE                    5 bytes EBCDIC num_value left-justified
   3583    0022     VSAM Erase Flag                     1 byte flag                
   3584    0023     VSAM Free CI %                      3 bytes EBCDIC num_value left-justified
   3585    0024     VSAM Free CA %                      3 bytes EBCDIC num_value left-justified
   3586    0025     VSAM Index Volume List              variable EBCDIC text list of 6-character Volume IDs
   3587    0026     VSAM Ordered Flag                   1 byte flag                
   3588    0027     VSAM REUSE Flag                     1 byte flag                
   3589    0028     VSAM SPANNED Flag                   1 byte flag                
   3590    0029     VSAM Recovery Flag                  1 byte flag                
   3591    002A     VSAM  WRITECHK  Flag                1 byte flag                
   3592    002B     VSAM Cluster/Data SHROPTS           3 bytes EBCDIC "n,y"        
   3593    002C     VSAM Index SHROPTS                  3 bytes EBCDIC "n,y"        
   3594    002D     VSAM Index Space Type               9 bytes EBCDIC text string
   3595    002E     VSAM Index Space Primary            9 bytes EBCDIC num_value left-justified
   3596    002F     VSAM Index Space Secondary          9 bytes EBCDIC num_value left-justified
   3597    0030     VSAM Index CISIZE                   5 bytes EBCDIC num_value left-justified
   3598    0031     VSAM Index IMBED                    1 byte flag                
   3599    0032     VSAM Index Ordered Flag             1 byte flag                
   3600    0033     VSAM REPLICATE Flag                 1 byte flag                
   3601    0034     VSAM Index REUSE Flag               1 byte flag                
   3602    0035     VSAM Index WRITECHK Flag            1 byte flag Retired with PKZIP 5.0 +
   3603    0036     VSAM Owner                          8 bytes EBCDIC text string
   3604    0037     VSAM Index Owner                    8 bytes EBCDIC text string
   3605    0038     Reserved
   3606    0039     Reserved
   3607    003A     Reserved
   3608    003B     Reserved
   3609    003C     Reserved
   3610    003D     Reserved
   3611    003E     Reserved
   3612    003F     Reserved
   3613    0040     Reserved
   3614    0041     Reserved
   3615    0042     Reserved
   3616    0043     Reserved
   3617    0044     Reserved
   3618    0045     Reserved
   3619    0046     Reserved
   3620    0047     Reserved
   3621    0048     Reserved
   3622    0049     Reserved
   3623    004A     Reserved
   3624    004B     Reserved
   3625    004C     Reserved
   3626    004D     Reserved
   3627    004E     Reserved
   3628    004F     Reserved
   3629    0050     Reserved
   3630    0051     Reserved
   3631    0052     Reserved
   3632    0053     Reserved
   3633    0054     Reserved
   3634    0055     Reserved
   3635    0056     Reserved
   3636    0057     Reserved
   3637    0058     PDS/PDSE Member TTR Info.           6 bytes  Big Endian
   3638    0059     PDS 1st LMOD Text TTR               3 bytes  Big Endian
   3639    005A     PDS LMOD EP Rec #                   4 bytes  Big Endian
   3640    005B     Reserved
   3641    005C     Max Length of records               2 bytes  Big Endian
   3642    005D     PDSE Flag                           1 byte flag
   3643    005E     Reserved
   3644    005F     Reserved
   3645    0060     Reserved
   3646    0061     Reserved
   3647    0062     Reserved
   3648    0063     Reserved
   3649    0064     Reserved
   3650    0065     Last Date Referenced                4 bytes  Packed Hex "yyyymmdd"
   3651    0066     Date Created                        4 bytes  Packed Hex "yyyymmdd"
   3652    0068     GZIP two words                      8 bytes
   3653    0071     Extended NOTE Location              12 bytes Big Endian
   3654    0072     Archive device UNIT                 6 bytes  EBCDIC
   3655    0073     Archive 1st Volume                  6 bytes  EBCDIC
   3656    0074     Archive 1st VOL File Seq#           2 bytes  Binary
   3657    0075     Native I/O Flags                    2 bytes
   3658    0081     Unix File Type                      1 byte   enumerated
   3659    0082     Unix File Format                    1 byte   enumerated
   3660    0083     Unix File Character Set Tag Info    4 bytes
   3661    0090     ZIP Environmental Processing Info   4 bytes
   3662    0091     EAV EATTR Flags                     1 byte
   3663    0092     DSNTYPE Flags                       1 byte   
   3664    0093     Total Space Allocation (Cyls)       4 bytes  Big Endian
   3665    009D     NONVSAM DSORG                       2 bytes  
   3666    009E     Program Virtual Object Info         3 bytes  
   3667    009F     Encapsulated file Info              9 bytes
   3668    00A2     Cluster Log                         4 bytes  Binary
   3669    00A3     Cluster LSID Length                 4 bytes  Binary
   3670    00A4     Cluster LSID                       26 bytes  EBCDIC
   3671    400C     Unix File Creation Time             4 bytes
   3672    400D     Unix File Access Time               4 bytes
   3673    400E     Unix File Modification time         4 bytes
   3674    4101     IBMCMPSC Compression Info           variable
   3675    4102     IBMCMPSC Compression Size           8 bytes  Big Endian
   3676 
   3677 APPENDIX C - Zip64 Extensible Data Sector Mappings 
   3678 ---------------------------------------------------
   3679 
   3680          -Z390   Extra Field:
   3681 
   3682           The following is the general layout of the attributes for the 
   3683           ZIP 64 "extra" block for extended tape operations.  
   3684 
   3685           Note: some fields stored in Big Endian format.  All text is 
   3686           in EBCDIC format unless otherwise specified.
   3687 
   3688           Value       Size          Description
   3689           -----       ----          -----------
   3690   (Z390)  0x0065      2 bytes       Tag for this "extra" block type
   3691           Size        4 bytes       Size for the following data block
   3692           Tag         4 bytes       EBCDIC "Z390"
   3693           Length71    2 bytes       Big Endian
   3694           Subcode71   2 bytes       Enote type code
   3695           FMEPos      1 byte
   3696           Length72    2 bytes       Big Endian
   3697           Subcode72   2 bytes       Unit type code
   3698           Unit        1 byte        Unit
   3699           Length73    2 bytes       Big Endian
   3700           Subcode73   2 bytes       Volume1 type code
   3701           FirstVol    1 byte        Volume
   3702           Length74    2 bytes       Big Endian
   3703           Subcode74   2 bytes       FirstVol file sequence
   3704           FileSeq     2 bytes       Sequence 
   3705 
   3706 APPENDIX D - Language Encoding (EFS)
   3707 ------------------------------------
   3708 
   3709 D.1 The ZIP format has historically supported only the original IBM PC character 
   3710 encoding set, commonly referred to as IBM Code Page 437.  This limits storing 
   3711 file name characters to only those within the original MS-DOS range of values 
   3712 and does not properly support file names in other character encodings, or 
   3713 languages. To address this limitation, this specification will support the 
   3714 following change. 
   3715 
   3716 D.2 If general purpose bit 11 is unset, the file name and comment SHOULD conform 
   3717 to the original ZIP character encoding.  If general purpose bit 11 is set, the 
   3718 filename and comment MUST support The Unicode Standard, Version 4.1.0 or 
   3719 greater using the character encoding form defined by the UTF-8 storage 
   3720 specification.  The Unicode Standard is published by the The Unicode
   3721 Consortium (www.unicode.org).  UTF-8 encoded data stored within ZIP files 
   3722 is expected to not include a byte order mark (BOM). 
   3723 
   3724 D.3 Applications MAY choose to supplement this file name storage through the use 
   3725 of the 0x0008 Extra Field.  Storage for this optional field is currently 
   3726 undefined, however it will be used to allow storing extended information 
   3727 on source or target encoding that MAY further assist applications with file 
   3728 name, or file content encoding tasks.  Please contact PKWARE with any
   3729 requirements on how this field SHOULD be used.
   3730 
   3731 D.4 The 0x0008 Extra Field storage MAY be used with either setting for general 
   3732 purpose bit 11.  Examples of the intended usage for this field is to store 
   3733 whether "modified-UTF-8" (JAVA) is used, or UTF-8-MAC.  Similarly, other 
   3734 commonly used character encoding (code page) designations can be indicated 
   3735 through this field.  Formalized values for use of the 0x0008 record remain 
   3736 undefined at this time.  The definition for the layout of the 0x0008 field
   3737 will be published when available.  Use of the 0x0008 Extra Field provides
   3738 for storing data within a ZIP file in an encoding other than IBM Code
   3739 Page 437 or UTF-8.
   3740 
   3741 D.5 General purpose bit 11 will not imply any encoding of file content or
   3742 password.  Values defining character encoding for file content or 
   3743 password MUST be stored within the 0x0008 Extended Language Encoding 
   3744 Extra Field.
   3745 
   3746 D.6 Ed Gordon of the Info-ZIP group has defined a pair of "extra field" records 
   3747 that can be used to store UTF-8 file name and file comment fields.  These
   3748 records can be used for cases when the general purpose bit 11 method
   3749 for storing UTF-8 data in the standard file name and comment fields is
   3750 not desirable.  A common case for this alternate method is if backward
   3751 compatibility with older programs is required.
   3752 
   3753 D.7 Definitions for the record structure of these fields are included above 
   3754 in the section on 3rd party mappings for "extra field" records.  These
   3755 records are identified by Header ID's 0x6375 (Info-ZIP Unicode Comment 
   3756 Extra Field) and 0x7075 (Info-ZIP Unicode Path Extra Field).
   3757 
   3758 D.8 The choice of which storage method to use when writing a ZIP file is left
   3759 to the implementation.  Developers SHOULD expect that a ZIP file MAY 
   3760 contain either method and SHOULD provide support for reading data in 
   3761 either format. Use of general purpose bit 11 reduces storage requirements 
   3762 for file name data by not requiring additional "extra field" data for
   3763 each file, but can result in older ZIP programs not being able to extract 
   3764 files.  Use of the 0x6375 and 0x7075 records will result in a ZIP file 
   3765 that SHOULD always be readable by older ZIP programs, but requires more 
   3766 storage per file to write file name and/or file comment fields.
   3767 
   3768 APPENDIX E - AE-x encryption marker
   3769 -----------------------------------
   3770 
   3771 E.1 AE-x defines an alternate password-based encryption method used 
   3772 in ZIP files that is based on a file encryption utility developed by 
   3773 Dr. Brian Gladman.  Information on Dr. Gladman's method is available at 
   3774 
   3775    http://www.gladman.me.uk/cryptography_technology/fileencrypt/
   3776 
   3777 E.2 AE-x uses AES with CTR (counter mode) and HMAC-SHA1.  It defines
   3778 encryption using key sizes of 128 bits or 256 bits.  It does not 
   3779 restrict support for decrypting 192 bits.
   3780 
   3781 E.3 This method uses the standard ZIP encryption bit (bit 0) 
   3782 of the general purpose bit flag (section 4.4.4) to indicate a
   3783 file is encrypted.  
   3784 
   3785 E.4 The compression method field (section 4.4.5) is set to 99 
   3786 to indicate a file has been encrypted using this method.
   3787 
   3788 E.5 The actual compression method is stored in an extra field
   3789 structure identified by a Header ID of 0x9901. Information on this
   3790 record structure can be found at http://www.winzip.com/aes_info.htm.
   3791 
   3792 E.6 Two versions are defined for the 0x9901 structure.  
   3793 
   3794    E.6.1 Version 1 stores the file CRC value in the CRC-32 field
   3795    (section 4.4.7). 
   3796 
   3797    E.6.2 Version 2 stores a value of 0 in the CRC-32 field.
   3798