This change adds two manpages: apk-v2(5) and apk-v3(5). These pages describe the v2 and v3 file formats respectively, as I currently understand them.
118 lines
4.1 KiB
Markdown
118 lines
4.1 KiB
Markdown
apk-v3(5)
|
|
|
|
# NAME
|
|
|
|
*apk v3* - overview of apk v3 format
|
|
|
|
# DECRIPTION
|
|
|
|
A v3 .apk file contains a single package's contents, some metadata, and
|
|
some signatures. The .apk file contains a tree of objects, represented
|
|
in a custom binary format and conforming overall to a pre-defined
|
|
schema. This file format is referred to inside *apk*(5) as "adb".
|
|
|
|
# WIRE FORMAT
|
|
|
|
A v3 apk file is composed of sequences of serialized values, each of
|
|
which begins with a 32-bit little-endian word - the value's tag. The
|
|
high 4 bits of the tag are a type code, and the low 28 bits are used for
|
|
an immediate value. Defined type codes are:
|
|
|
|
0x0 Special (direct)
|
|
0x1 Int (direct)
|
|
0x2 Int32 (indirect)
|
|
0x3 Int64 (indirect)
|
|
0x8 Blob8 (indirect)
|
|
0x9 Blob16 (indirect)
|
|
0xa Blob32 (indirect)
|
|
0xd Array (indirect)
|
|
0xe Object (indirect)
|
|
|
|
A direct value is packed into the low 28 bits of the tag word; an
|
|
indirect value is instead stored elsewhere in the file, and the offset
|
|
of that indirect value is packed into the low 28 bits of the tag word.
|
|
|
|
Arrays and objects are represented with a sequence of numbered slots;
|
|
the value packed into their tag word is the offset at which this
|
|
sequence starts. The first slot is always the total number of slots, so
|
|
all arrays and objects contain at least one item.
|
|
|
|
The only real difference between arrays and objects in the wire encoding
|
|
is that arrays are homogenous, whereas objects are heterogenous with a
|
|
separate defined type for each slot.
|
|
|
|
The special type is used to represent three atoms:
|
|
|
|
0x0 NULL
|
|
0x1 TRUE
|
|
0x2 FALSE
|
|
|
|
# FILE SCHEMAS
|
|
|
|
A schema is a representation of what data elements are expected in an
|
|
adb file. Schemas form a tree, where nodes are either scalar schemas
|
|
(which are leaves in the tree) or array/object schemas, which themselves
|
|
have children. For example, the schema for a package object might
|
|
declare that it contains fields which themselves conform to the string
|
|
array schema, or the pkginfo schema, or similar.
|
|
|
|
The schemas themselves are not represented in the adb file in any way;
|
|
they exist in the parts of *apk*(1) that read and write such files. A
|
|
full description of all of apk's schemas would be lengthy, but as an
|
|
example, here is the schema for a single file inside a package:
|
|
|
|
ADBI_FI_NAME "name" string
|
|
ADBI_FI_ACL "acl" acl
|
|
ADBI_FI_SIZE "size" int
|
|
ADBI_FI_MTIME "mtime" int
|
|
ADBI_FI_HASHES "hash" hexblob
|
|
ADBI_FI_TARGET "target" hexblob
|
|
|
|
Here, all of the fields except for "acl" are scalars, and acl is itself
|
|
a schema looking like:
|
|
|
|
ADBI_ACL_MODE "mode" oct
|
|
ADBI_ACL_USER "user" string
|
|
ADBI_ACL_GROUP "group" string
|
|
|
|
# BLOCKS
|
|
|
|
An actual adb file is composed of a sequence of typed blocks; a block
|
|
also begins with a 32-bit little-endian tag word, which has two bits of
|
|
type and 30 bits of size. The two type bits are:
|
|
|
|
0x0 ADB
|
|
0x1 SIG
|
|
0x2 DATA
|
|
0x3 DATAX
|
|
|
|
The adb file must begin with one ADB block, then optionally one SIG
|
|
block, then one or more DATA blocks. The ADB block must begin with a
|
|
magic number indicating the schema for the entire ADB block's root
|
|
object. The ADB block also contains, outside the root object, some
|
|
metadata describing the version of the adb format in use.
|
|
|
|
The SIG block contains a signature of the ADB block. Unlike the v2
|
|
format, the key used for the signature is not explicitly specified, so
|
|
verifiers must try all trusted keys until they find one. Also unlike the
|
|
v2 format, the only supported hash algorithm is SHA512, and the
|
|
signature scheme is implied by the signing key in use rather than being
|
|
derived from the signature block.
|
|
|
|
The DATA blocks are used to store package file data only; all file
|
|
metadata, including content hashes, is stored in the ADB block instead.
|
|
The contents of the DATA blocks are therefore protected by the hashes
|
|
given in the ADB block, which is itself protected by the signature in
|
|
the SIG block.
|
|
|
|
It is currently illegal for a DATAX block to appear.
|
|
|
|
# NOTES
|
|
|
|
The v3 file format is entangled with C struct layout, since it sometimes
|
|
directly writes structs into the adb section, including any
|
|
compiler-added padding and such.
|
|
|
|
# SEE ALSO
|
|
|
|
*abuild*(1), *apk*(1), *apk-v2*(5)
|