mirror of
https://github.com/google/flatbuffers.git
synced 2026-06-19 13:13:06 +00:00
First attempt at SchemaLess FlatBuffers.
Change-Id: I86b9d002f3441ef9efdb70e059b8530ab2d74bb8 Tested: on Linux.
This commit is contained in:
committed by
Wouter van Oortmerssen
parent
dabe030890
commit
aac6be1153
@@ -292,4 +292,148 @@ flexibility in which of the children of root object to write first (though in
|
||||
this case there's only one string), and what order to write the fields in.
|
||||
Different orders may also cause different alignments to happen.
|
||||
|
||||
# FlexBuffers
|
||||
|
||||
The [schema-less](@ref flexbuffers) version of FlatBuffers have their
|
||||
own encoding, detailed here.
|
||||
|
||||
It shares many properties mentioned above, in that all data is accessed
|
||||
over offsets, all scalars are aligned to their own size, and
|
||||
all data is always stored in little endian format.
|
||||
|
||||
One difference is that FlexBuffers are built front to back, so children are
|
||||
stored before parents, and the root of the data starts at the last byte.
|
||||
|
||||
Another difference is that scalar data is stored with a variable number of bits
|
||||
(8/16/32/64). The current width is always determined by the *parent*, i.e. if
|
||||
the scalar sits in a vector, the vector determines the bit width for all
|
||||
elements at once. Selecting the minimum bit width for a particular vector is
|
||||
something the encoder does automatically and thus is typically of no concern
|
||||
to the user, though being aware of this feature (and not sticking a double in
|
||||
the same vector as a bunch of byte sized elements) is helpful for efficiency.
|
||||
|
||||
Unlike FlatBuffers there is only one kind of offset, and that is an unsigned
|
||||
integer indicating the number of bytes in a negative direction from the address
|
||||
of itself (where the offset is stored).
|
||||
|
||||
### Vectors
|
||||
|
||||
The representation of the vector is at the core of how FlexBuffers works (since
|
||||
maps are really just a combination of 2 vectors), so it is worth starting there.
|
||||
|
||||
As mentioned, a vector is governed by a single bit width (supplied by its
|
||||
parent). This includes the size field. For example, a vector that stores the
|
||||
integer values `1, 2, 3` is encoded as follows:
|
||||
|
||||
uint8_t 3, 1, 2, 3, 4, 4, 4
|
||||
|
||||
The first `3` is the size field, and is placed before the vector (an offset
|
||||
from the parent to this vector points to the first element, not the size
|
||||
field, so the size field is effectively at index -1).
|
||||
Since this is an untyped vector `SL_VECTOR`, it is followed by 3 type
|
||||
bytes (one per element of the vector), which are always following the vector,
|
||||
and are always a uint8_t even if the vector is made up of bigger scalars.
|
||||
|
||||
### Types
|
||||
|
||||
A type byte is made up of 2 components (see flexbuffers.h for exact values):
|
||||
|
||||
* 2 lower bits representing the bit-width of the child (8, 16, 32, 64).
|
||||
This is only used if the child is accessed over an offset, such as a child
|
||||
vector. It is ignored for inline types.
|
||||
* 6 bits representing the actual type (see flexbuffers.h).
|
||||
|
||||
Thus, in this example `4` means 8 bit child (value 0, unused, since the value is
|
||||
in-line), type `SL_INT` (value 1).
|
||||
|
||||
### Typed Vectors
|
||||
|
||||
These are like the Vectors above, but omit the type bytes. The type is instead
|
||||
determined by the vector type supplied by the parent. Typed vectors are only
|
||||
available for a subset of types for which these savings can be significant,
|
||||
namely inline signed/unsigned integers (`TYPE_VECTOR_INT` / `TYPE_VECTOR_UINT`),
|
||||
floats (`TYPE_VECTOR_FLOAT`), and keys (`TYPE_VECTOR_KEY`, see below).
|
||||
|
||||
Additionally, for scalars, there are fixed length vectors of sizes 2 / 3 / 4
|
||||
that don't store the size (`TYPE_VECTOR_INT2` etc.), for an additional savings
|
||||
in space when storing common vector or color data.
|
||||
|
||||
### Scalars
|
||||
|
||||
FlexBuffers supports integers (`TYPE_INT` and `TYPE_UINT`) and floats
|
||||
(`TYPE_FLOAT`), available in the bit-widths mentioned above. They can be stored
|
||||
both inline and over an offset (`TYPE_INDIRECT_*`).
|
||||
|
||||
The offset version is useful to encode costly 64bit (or even 32bit) quantities
|
||||
into vectors / maps of smaller sizes, and to share / repeat a value multiple
|
||||
times.
|
||||
|
||||
### Blobs, Strings and Keys.
|
||||
|
||||
A blob (`TYPE_BLOB`) is encoded similar to a vector, with one difference: the
|
||||
elements are always `uint8_t`. The parent bit width only determines the width of
|
||||
the size field, allowing blobs to be large without the elements being large.
|
||||
|
||||
Strings (`TYPE_STRING`) are similar to blobs, except they have an additional 0
|
||||
termination byte for convenience, and they MUST be UTF-8 encoded (since an
|
||||
accessor in a language that does not support pointers to UTF-8 data may have to
|
||||
convert them to a native string type).
|
||||
|
||||
A "Key" (`TYPE_KEY`) is similar to a string, but doesn't store the size
|
||||
field. They're so named because they are used with maps, which don't care
|
||||
for the size, and can thus be even more compact. Unlike strings, keys cannot
|
||||
contain bytes of value 0 as part of their data (size can only be determined by
|
||||
`strlen`), so while you can use them outside the context of maps if you so
|
||||
desire, you're usually better off with strings.
|
||||
|
||||
### Maps
|
||||
|
||||
A map (`TYPE_MAP`) is like an (untyped) vector, but with 2 prefixes before the
|
||||
size field:
|
||||
|
||||
| index | field |
|
||||
| ----: | :----------------------------------------------------------- |
|
||||
| -3 | An offset to the keys vector (may be shared between tables). |
|
||||
| -2 | Byte width of the keys vector. |
|
||||
| -1 | Size (from here on it is compatible with `TYPE_VECTOR`) |
|
||||
| 0 | Elements. |
|
||||
| Size | Types. |
|
||||
|
||||
Since a map is otherwise the same as a vector, it can be iterated like
|
||||
a vector (which is probably faster than lookup by key).
|
||||
|
||||
The keys vector is a typed vector of keys. Both the keys and corresponding
|
||||
values *have* to be stored in sorted order (as determined by `strcmp`), such
|
||||
that lookups can be made using binary search.
|
||||
|
||||
The reason the key vector is a seperate structure from the value vector is
|
||||
such that it can be shared between multiple value vectors, and also to
|
||||
allow it to be treated as its own indivual vector in code.
|
||||
|
||||
An example map { foo: 13, bar: 14 } would be encoded as:
|
||||
|
||||
0 : uint8_t 'f', 'o', 'o', 0
|
||||
4 : uint8_t 'b', 'a', 'r', 0
|
||||
8 : uint8_t 2 // key vector of size 2
|
||||
// key vector offset points here
|
||||
9 : uint8_t 9, 6 // offsets to foo_key and bar_key
|
||||
11: uint8_t 3, 1 // offset to key vector, and its byte width
|
||||
13: uint8_t 2 // value vector of size
|
||||
// value vector offset points here
|
||||
14: uint8_t 13, 14 // values
|
||||
16: uint8_t 4, 4 // types
|
||||
|
||||
### The root
|
||||
|
||||
As mentioned, the root starts at the end of the buffer.
|
||||
The last uint8_t is the width in bytes of the root (normally the parent
|
||||
determines the width, but the root has no parent). The uint8_t before this is
|
||||
the type of the root, and the bytes before that are the root value (of the
|
||||
number of bytes specified by the last byte).
|
||||
|
||||
So for example, the integer value `13` as root would be:
|
||||
|
||||
uint8_t 13, 4, 1 // Value, type, root byte width.
|
||||
|
||||
|
||||
<br>
|
||||
|
||||
Reference in New Issue
Block a user