mirror of
https://github.com/google/flatbuffers.git
synced 2026-06-07 22:03:40 +00:00
First attempt at SchemaLess FlatBuffers.
Change-Id: I86b9d002f3441ef9efdb70e059b8530ab2d74bb8 Tested: on Linux.
This commit is contained in:
committed by
Wouter van Oortmerssen
parent
dabe030890
commit
aac6be1153
@@ -30,6 +30,7 @@ set(FlatBuffers_Library_SRCS
|
||||
include/flatbuffers/util.h
|
||||
include/flatbuffers/reflection.h
|
||||
include/flatbuffers/reflection_generated.h
|
||||
include/flatbuffers/flexbuffers.h
|
||||
src/code_generators.cpp
|
||||
src/idl_parser.cpp
|
||||
src/idl_gen_text.cpp
|
||||
|
||||
@@ -78,6 +78,9 @@ inefficiency, but also forces you to write *more* code to access data
|
||||
In this context, it is only a better choice for systems that have very
|
||||
little to no information ahead of time about what data needs to be stored.
|
||||
|
||||
If you do need to store data that doesn't fit a schema, FlatBuffers also
|
||||
offers a schema-less (self-describing) version!
|
||||
|
||||
Read more about the "why" of FlatBuffers in the
|
||||
[white paper](@ref flatbuffers_white_paper).
|
||||
|
||||
@@ -138,6 +141,8 @@ sections provide a more in-depth usage guide.
|
||||
using FlatBuffers.
|
||||
- A [white paper](@ref flatbuffers_white_paper) explaining the "why" of
|
||||
FlatBuffers.
|
||||
- How to use the [schema-less](@ref flexbuffers) version of
|
||||
FlatBuffers.
|
||||
- A description of the [internals](@ref flatbuffers_internals) of FlatBuffers.
|
||||
- A formal [grammar](@ref flatbuffers_grammar) of the schema language.
|
||||
|
||||
|
||||
156
docs/source/FlexBuffers.md
Normal file
156
docs/source/FlexBuffers.md
Normal file
@@ -0,0 +1,156 @@
|
||||
FlexBuffers {#flexbuffers}
|
||||
==========
|
||||
|
||||
FlatBuffers was designed around schemas, because when you want maximum
|
||||
performance and data consistency, strong typing is helpful.
|
||||
|
||||
There are however times when you want to store data that doesn't fit a
|
||||
schema, because you can't know ahead of time what all needs to be stored.
|
||||
|
||||
For this, FlatBuffers has a dedicated format, called FlexBuffers.
|
||||
This is a binary format that can be used in conjunction
|
||||
with FlatBuffers (by storing a part of a buffer in FlexBuffers
|
||||
format), or also as its own independent serialization format.
|
||||
|
||||
While it loses the strong typing, you retain the most unique advantage
|
||||
FlatBuffers has over other serialization formats (schema-based or not):
|
||||
FlexBuffers can also be accessed without parsing / copying / object allocation.
|
||||
This is a huge win in efficiency / memory friendly-ness, and allows unique
|
||||
use cases such as mmap-ing large amounts of free-form data.
|
||||
|
||||
FlexBuffers design and implementation allows for a very compact encoding,
|
||||
combining automatic pooling of strings with automatic sizing of containers to
|
||||
their smallest possible representation (8/16/32/64 bits). Many values and
|
||||
offsets can be encoded in just 8 bits. While a schema-less representation is
|
||||
usually more bulky because of the need to be self-descriptive, FlexBuffers
|
||||
generates smaller binaries for many cases than regular FlatBuffers.
|
||||
|
||||
FlexBuffers is still slower than regular FlatBuffers though, so we recommend to
|
||||
only use it if you need it.
|
||||
|
||||
|
||||
# Usage
|
||||
|
||||
This is for C++, other languages may follow.
|
||||
|
||||
Include the header `flexbuffers.h`, which in turn depends on `flatbuffers.h`
|
||||
and `util.h`.
|
||||
|
||||
To create a buffer:
|
||||
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
|
||||
flexbuffers::Builder fbb;
|
||||
fbb.Int(13);
|
||||
fbb.Finish();
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
You create any value, followed by `Finish`. Unlike FlatBuffers which requires
|
||||
the root value to be a table, here any value can be the root, including a lonely
|
||||
int value.
|
||||
|
||||
You can now access the `std::vector<uint8_t>` that contains the encoded value
|
||||
as `fbb.GetBuffer()`. Write it, send it, or store it in a parent FlatBuffer. In
|
||||
this case, the buffer is just 3 bytes in size.
|
||||
|
||||
To read this value back, you could just say:
|
||||
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
|
||||
auto root = flexbuffers::GetRoot(my_buffer);
|
||||
int64_t i = root.AsInt64();
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
FlexBuffers stores ints only as big as needed, so it doesn't differentiate
|
||||
between different sizes of ints. You can ask for the 64 bit version,
|
||||
regardless of what you put in. In fact, since you demand to read the root
|
||||
as an int, if you supply a buffer that actually contains a float, or a
|
||||
string with numbers in it, it will convert it for you on the fly as well,
|
||||
or return 0 if it can't. If instead you actually want to know what is inside
|
||||
the buffer before you access it, you can call `root.GetType()` or `root.IsInt()`
|
||||
etc.
|
||||
|
||||
Here's a slightly more complex value you could write instead of `fbb.Int` above:
|
||||
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
|
||||
fbb.Map([&]() {
|
||||
fbb.Vector("vec", [&]() {
|
||||
fbb.Int(-100);
|
||||
fbb.String("Fred");
|
||||
fbb.IndirectFloat(4.0f);
|
||||
});
|
||||
fbb.UInt("foo", 100);
|
||||
});
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
This stores the equivalent of the JSON value
|
||||
`{ vec: [ -100, "Fred", 4.0 ], foo: 100 }`. The root is a dictionary that has
|
||||
just two key-value pairs, with keys `vec` and `foo`. Unlike FlatBuffers, it
|
||||
actually has to store these keys in the buffer (which it does only once if
|
||||
you store multiple such objects, by pooling key values), but also unlike
|
||||
FlatBuffers it has no restriction on the keys (fields) that you use.
|
||||
|
||||
The map constructor uses a C++11 Lambda to group its children, but you can
|
||||
also use more conventional start/end calls if you prefer.
|
||||
|
||||
The first value in the map is a vector. You'll notice that unlike FlatBuffers,
|
||||
you can use mixed types. There is also a `TypedVector` variant that only
|
||||
allows a single type, and uses a bit less memory.
|
||||
|
||||
`IndirectFloat` is an interesting feature that allows you to store values
|
||||
by offset rather than inline. Though that doesn't make any visible change
|
||||
to the user, the consequence is that large values (especially doubles or
|
||||
64 bit ints) that occur more than once can be shared. Another use case is
|
||||
inside of vectors, where the largest element makes up the size of all elements
|
||||
(e.g. a single double forces all elements to 64bit), so storing a lot of small
|
||||
integers together with a double is more efficient if the double is indirect.
|
||||
|
||||
Accessing it:
|
||||
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~{.cpp}
|
||||
auto map = flexbuffers::GetRoot(my_buffer).AsMap();
|
||||
map.size(); // 2
|
||||
auto vec = map["vec"].AsVector();
|
||||
vec.size(); // 3
|
||||
vec[0].AsInt64(); // -100;
|
||||
vec[1].AsString().c_str(); // "Fred";
|
||||
vec[1].AsInt64(); // 0 (Number parsing failed).
|
||||
vec[2].AsDouble(); // 4.0
|
||||
vec[2].AsString().IsTheEmptyString(); // true (Wrong Type).
|
||||
vec[2].AsString().c_str(); // "" (This still works though).
|
||||
vec[2].ToString().c_str(); // "4" (Or have it converted).
|
||||
map["foo"].AsUInt8(); // 100
|
||||
map["unknown"].IsNull(); // true
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
|
||||
# Binary encoding
|
||||
|
||||
A description of how FlexBuffers are encoded is in the
|
||||
[internals](@ref flatbuffers_internals) document.
|
||||
|
||||
|
||||
# Efficiency tips
|
||||
|
||||
* Vectors generally are a lot more efficient than maps, so prefer them over maps
|
||||
when possible for small objects. Instead of a map with keys `x`, `y` and `z`,
|
||||
use a vector. Better yet, use a typed vector. Or even better, use a fixed
|
||||
size typed vector.
|
||||
* Maps are backwards compatible with vectors, and can be iterated as such.
|
||||
You can iterate either just the values (`map.Values()`), or in parallel with
|
||||
the keys vector (`map.Keys()`). If you intend
|
||||
to access most or all elements, this is faster than looking up each element
|
||||
by key, since that involves a binary search of the key vector.
|
||||
* When possible, don't mix values that require a big bit width (such as double)
|
||||
in a large vector of smaller values, since all elements will take on this
|
||||
width. Use `IndirectDouble` when this is a possibility. Note that
|
||||
integers automatically use the smallest width possible, i.e. if you ask
|
||||
to serialize an int64_t whose value is actually small, you will use less
|
||||
bits. Doubles are represented as floats whenever possible losslessly, but
|
||||
this is only possible for few values.
|
||||
Since nested vectors/maps are stored over offsets, they typically don't
|
||||
affect the vector width.
|
||||
* To store large arrays of byte data, use a blob. If you'd use a typed
|
||||
vector, the bit width of the size field may make it use more space than
|
||||
expected, and may not be compatible with `memcpy`.
|
||||
Similarly, large arrays of (u)int16_t may be better off stored as a
|
||||
binary blob if their size could exceed 64k elements.
|
||||
Construction and use are otherwise similar to strings.
|
||||
@@ -292,4 +292,148 @@ flexibility in which of the children of root object to write first (though in
|
||||
this case there's only one string), and what order to write the fields in.
|
||||
Different orders may also cause different alignments to happen.
|
||||
|
||||
# FlexBuffers
|
||||
|
||||
The [schema-less](@ref flexbuffers) version of FlatBuffers have their
|
||||
own encoding, detailed here.
|
||||
|
||||
It shares many properties mentioned above, in that all data is accessed
|
||||
over offsets, all scalars are aligned to their own size, and
|
||||
all data is always stored in little endian format.
|
||||
|
||||
One difference is that FlexBuffers are built front to back, so children are
|
||||
stored before parents, and the root of the data starts at the last byte.
|
||||
|
||||
Another difference is that scalar data is stored with a variable number of bits
|
||||
(8/16/32/64). The current width is always determined by the *parent*, i.e. if
|
||||
the scalar sits in a vector, the vector determines the bit width for all
|
||||
elements at once. Selecting the minimum bit width for a particular vector is
|
||||
something the encoder does automatically and thus is typically of no concern
|
||||
to the user, though being aware of this feature (and not sticking a double in
|
||||
the same vector as a bunch of byte sized elements) is helpful for efficiency.
|
||||
|
||||
Unlike FlatBuffers there is only one kind of offset, and that is an unsigned
|
||||
integer indicating the number of bytes in a negative direction from the address
|
||||
of itself (where the offset is stored).
|
||||
|
||||
### Vectors
|
||||
|
||||
The representation of the vector is at the core of how FlexBuffers works (since
|
||||
maps are really just a combination of 2 vectors), so it is worth starting there.
|
||||
|
||||
As mentioned, a vector is governed by a single bit width (supplied by its
|
||||
parent). This includes the size field. For example, a vector that stores the
|
||||
integer values `1, 2, 3` is encoded as follows:
|
||||
|
||||
uint8_t 3, 1, 2, 3, 4, 4, 4
|
||||
|
||||
The first `3` is the size field, and is placed before the vector (an offset
|
||||
from the parent to this vector points to the first element, not the size
|
||||
field, so the size field is effectively at index -1).
|
||||
Since this is an untyped vector `SL_VECTOR`, it is followed by 3 type
|
||||
bytes (one per element of the vector), which are always following the vector,
|
||||
and are always a uint8_t even if the vector is made up of bigger scalars.
|
||||
|
||||
### Types
|
||||
|
||||
A type byte is made up of 2 components (see flexbuffers.h for exact values):
|
||||
|
||||
* 2 lower bits representing the bit-width of the child (8, 16, 32, 64).
|
||||
This is only used if the child is accessed over an offset, such as a child
|
||||
vector. It is ignored for inline types.
|
||||
* 6 bits representing the actual type (see flexbuffers.h).
|
||||
|
||||
Thus, in this example `4` means 8 bit child (value 0, unused, since the value is
|
||||
in-line), type `SL_INT` (value 1).
|
||||
|
||||
### Typed Vectors
|
||||
|
||||
These are like the Vectors above, but omit the type bytes. The type is instead
|
||||
determined by the vector type supplied by the parent. Typed vectors are only
|
||||
available for a subset of types for which these savings can be significant,
|
||||
namely inline signed/unsigned integers (`TYPE_VECTOR_INT` / `TYPE_VECTOR_UINT`),
|
||||
floats (`TYPE_VECTOR_FLOAT`), and keys (`TYPE_VECTOR_KEY`, see below).
|
||||
|
||||
Additionally, for scalars, there are fixed length vectors of sizes 2 / 3 / 4
|
||||
that don't store the size (`TYPE_VECTOR_INT2` etc.), for an additional savings
|
||||
in space when storing common vector or color data.
|
||||
|
||||
### Scalars
|
||||
|
||||
FlexBuffers supports integers (`TYPE_INT` and `TYPE_UINT`) and floats
|
||||
(`TYPE_FLOAT`), available in the bit-widths mentioned above. They can be stored
|
||||
both inline and over an offset (`TYPE_INDIRECT_*`).
|
||||
|
||||
The offset version is useful to encode costly 64bit (or even 32bit) quantities
|
||||
into vectors / maps of smaller sizes, and to share / repeat a value multiple
|
||||
times.
|
||||
|
||||
### Blobs, Strings and Keys.
|
||||
|
||||
A blob (`TYPE_BLOB`) is encoded similar to a vector, with one difference: the
|
||||
elements are always `uint8_t`. The parent bit width only determines the width of
|
||||
the size field, allowing blobs to be large without the elements being large.
|
||||
|
||||
Strings (`TYPE_STRING`) are similar to blobs, except they have an additional 0
|
||||
termination byte for convenience, and they MUST be UTF-8 encoded (since an
|
||||
accessor in a language that does not support pointers to UTF-8 data may have to
|
||||
convert them to a native string type).
|
||||
|
||||
A "Key" (`TYPE_KEY`) is similar to a string, but doesn't store the size
|
||||
field. They're so named because they are used with maps, which don't care
|
||||
for the size, and can thus be even more compact. Unlike strings, keys cannot
|
||||
contain bytes of value 0 as part of their data (size can only be determined by
|
||||
`strlen`), so while you can use them outside the context of maps if you so
|
||||
desire, you're usually better off with strings.
|
||||
|
||||
### Maps
|
||||
|
||||
A map (`TYPE_MAP`) is like an (untyped) vector, but with 2 prefixes before the
|
||||
size field:
|
||||
|
||||
| index | field |
|
||||
| ----: | :----------------------------------------------------------- |
|
||||
| -3 | An offset to the keys vector (may be shared between tables). |
|
||||
| -2 | Byte width of the keys vector. |
|
||||
| -1 | Size (from here on it is compatible with `TYPE_VECTOR`) |
|
||||
| 0 | Elements. |
|
||||
| Size | Types. |
|
||||
|
||||
Since a map is otherwise the same as a vector, it can be iterated like
|
||||
a vector (which is probably faster than lookup by key).
|
||||
|
||||
The keys vector is a typed vector of keys. Both the keys and corresponding
|
||||
values *have* to be stored in sorted order (as determined by `strcmp`), such
|
||||
that lookups can be made using binary search.
|
||||
|
||||
The reason the key vector is a seperate structure from the value vector is
|
||||
such that it can be shared between multiple value vectors, and also to
|
||||
allow it to be treated as its own indivual vector in code.
|
||||
|
||||
An example map { foo: 13, bar: 14 } would be encoded as:
|
||||
|
||||
0 : uint8_t 'f', 'o', 'o', 0
|
||||
4 : uint8_t 'b', 'a', 'r', 0
|
||||
8 : uint8_t 2 // key vector of size 2
|
||||
// key vector offset points here
|
||||
9 : uint8_t 9, 6 // offsets to foo_key and bar_key
|
||||
11: uint8_t 3, 1 // offset to key vector, and its byte width
|
||||
13: uint8_t 2 // value vector of size
|
||||
// value vector offset points here
|
||||
14: uint8_t 13, 14 // values
|
||||
16: uint8_t 4, 4 // types
|
||||
|
||||
### The root
|
||||
|
||||
As mentioned, the root starts at the end of the buffer.
|
||||
The last uint8_t is the width in bytes of the root (normally the parent
|
||||
determines the width, but the root has no parent). The uint8_t before this is
|
||||
the type of the root, and the bytes before that are the root value (of the
|
||||
number of bytes specified by the last byte).
|
||||
|
||||
So for example, the integer value `13` as root would be:
|
||||
|
||||
uint8_t 13, 4, 1 // Value, type, root byte width.
|
||||
|
||||
|
||||
<br>
|
||||
|
||||
@@ -759,6 +759,7 @@ INPUT = "FlatBuffers.md" \
|
||||
"Support.md" \
|
||||
"Benchmarks.md" \
|
||||
"WhitePaper.md" \
|
||||
"FlexBuffers.md" \
|
||||
"Internals.md" \
|
||||
"Grammar.md" \
|
||||
"../../CONTRIBUTING.md" \
|
||||
|
||||
@@ -37,6 +37,8 @@
|
||||
title="Use in PHP"/>
|
||||
<tab type="user" url="@ref flatbuffers_guide_use_python"
|
||||
title="Use in Python"/>
|
||||
<tab type="user" url="@ref flexbuffers"
|
||||
title="Schema-less version"/>
|
||||
</tab>
|
||||
<tab type="user" url="@ref flatbuffers_support"
|
||||
title="Platform / Language / Feature support"/>
|
||||
|
||||
1332
include/flatbuffers/flexbuffers.h
Normal file
1332
include/flatbuffers/flexbuffers.h
Normal file
File diff suppressed because it is too large
Load Diff
@@ -27,6 +27,8 @@
|
||||
#include <random>
|
||||
#endif
|
||||
|
||||
#include "flatbuffers/flexbuffers.h"
|
||||
|
||||
using namespace MyGame::Example;
|
||||
|
||||
#ifdef __ANDROID__
|
||||
@@ -491,8 +493,6 @@ void ReflectionTest(uint8_t *flatbuf, size_t length) {
|
||||
TEST_NOTNULL(pos_table_ptr);
|
||||
TEST_EQ_STR(pos_table_ptr->name()->c_str(), "MyGame.Example.Vec3");
|
||||
|
||||
|
||||
|
||||
// Now use it to dynamically access a buffer.
|
||||
auto &root = *flatbuffers::GetAnyRoot(flatbuf);
|
||||
|
||||
@@ -1360,6 +1360,66 @@ void ConformTest() {
|
||||
test_conform("enum E:byte { B, A }", "values differ for enum");
|
||||
}
|
||||
|
||||
void FlexBuffersTest() {
|
||||
flexbuffers::Builder slb(512,
|
||||
flexbuffers::BUILDER_FLAG_SHARE_KEYS_AND_STRINGS);
|
||||
|
||||
// Write the equivalent of:
|
||||
// { vec: [ -100, "Fred", 4.0 ], bar: [ 1, 2, 3 ], foo: 100 }
|
||||
slb.Map([&]() {
|
||||
slb.Vector("vec", [&]() {
|
||||
slb += -100; // Equivalent to slb.Add(-100) or slb.Int(-100);
|
||||
slb += "Fred";
|
||||
slb.IndirectFloat(4.0f);
|
||||
});
|
||||
std::vector<int> ints = { 1, 2, 3 };
|
||||
slb.Add("bar", ints);
|
||||
slb.FixedTypedVector("bar3", ints.data(), ints.size()); // Static size.
|
||||
slb.Double("foo", 100);
|
||||
slb.Map("mymap", [&]() {
|
||||
slb.String("foo", "Fred"); // Testing key and string reuse.
|
||||
});
|
||||
});
|
||||
slb.Finish();
|
||||
|
||||
for (size_t i = 0; i < slb.GetBuffer().size(); i++)
|
||||
printf("%d ", slb.GetBuffer().data()[i]);
|
||||
printf("\n");
|
||||
|
||||
auto map = flexbuffers::GetRoot(slb.GetBuffer()).AsMap();
|
||||
TEST_EQ(map.size(), 5);
|
||||
auto vec = map["vec"].AsVector();
|
||||
TEST_EQ(vec.size(), 3);
|
||||
TEST_EQ(vec[0].AsInt64(), -100);
|
||||
TEST_EQ_STR(vec[1].AsString().c_str(), "Fred");
|
||||
TEST_EQ(vec[1].AsInt64(), 0); // Number parsing failed.
|
||||
TEST_EQ(vec[2].AsDouble(), 4.0);
|
||||
TEST_EQ(vec[2].AsString().IsTheEmptyString(), true); // Wrong Type.
|
||||
TEST_EQ_STR(vec[2].AsString().c_str(), ""); // This still works though.
|
||||
TEST_EQ_STR(vec[2].ToString().c_str(), "4"); // Or have it converted.
|
||||
auto tvec = map["bar"].AsTypedVector();
|
||||
TEST_EQ(tvec.size(), 3);
|
||||
TEST_EQ(tvec[2].AsInt8(), 3);
|
||||
auto tvec3 = map["bar3"].AsFixedTypedVector();
|
||||
TEST_EQ(tvec3.size(), 3);
|
||||
TEST_EQ(tvec3[2].AsInt8(), 3);
|
||||
TEST_EQ(map["foo"].AsUInt8(), 100);
|
||||
TEST_EQ(map["unknown"].IsNull(), true);
|
||||
auto mymap = map["mymap"].AsMap();
|
||||
// These should be equal by pointer equality, since key and value are shared.
|
||||
TEST_EQ(mymap.Keys()[0].AsKey(), map.Keys()[2].AsKey());
|
||||
TEST_EQ(mymap.Values()[0].AsString().c_str(), vec[1].AsString().c_str());
|
||||
// We can mutate values in the buffer.
|
||||
TEST_EQ(vec[0].MutateInt(-99), true);
|
||||
TEST_EQ(vec[0].AsInt64(), -99);
|
||||
TEST_EQ(vec[1].MutateString("John"), true); // Size must match.
|
||||
TEST_EQ_STR(vec[1].AsString().c_str(), "John");
|
||||
TEST_EQ(vec[1].MutateString("Alfred"), false); // Too long.
|
||||
TEST_EQ(vec[2].MutateFloat(2.0f), true);
|
||||
TEST_EQ(vec[2].AsFloat(), 2.0f);
|
||||
TEST_EQ(vec[2].MutateFloat(3.14159), false); // Double does not fit in float.
|
||||
}
|
||||
|
||||
int main(int /*argc*/, const char * /*argv*/[]) {
|
||||
// Run our various test suites:
|
||||
|
||||
@@ -1399,6 +1459,8 @@ int main(int /*argc*/, const char * /*argv*/[]) {
|
||||
ParseUnionTest();
|
||||
ConformTest();
|
||||
|
||||
FlexBuffersTest();
|
||||
|
||||
if (!testing_fails) {
|
||||
TEST_OUTPUT_LINE("ALL TESTS PASSED");
|
||||
return 0;
|
||||
|
||||
Reference in New Issue
Block a user