mirror of
https://github.com/google/flatbuffers.git
synced 2026-06-05 21:17:25 +00:00
Initial commit of the FlatBuffers code.
Change-Id: I4c9f0f722490b374257adb3fec63e44ae93da920 Tested: using VS2010 / Xcode / gcc on Linux.
This commit is contained in:
49
docs/source/Benchmarks.md
Executable file
49
docs/source/Benchmarks.md
Executable file
@@ -0,0 +1,49 @@
|
||||
# Benchmarks
|
||||
|
||||
Comparing against other serialization solutions, running on Windows 7
|
||||
64bit. We use the LITE runtime for Protocol Buffers (less code / lower
|
||||
overhead), and Rapid JSON, one of the fastest C++ JSON parsers around.
|
||||
|
||||
We compare against Flatbuffers with the binary wire format (as
|
||||
intended), and also with JSON as the wire format with the optional JSON
|
||||
parser (which, using a schema, parses JSON into a binary buffer that can
|
||||
then be accessed as before).
|
||||
|
||||
The benchmark object is a set of about 10 objects containing an array, 4
|
||||
strings, and a large variety of int/float scalar values of all sizes,
|
||||
meant to be representative of game data, e.g. a scene format.
|
||||
|
||||
| | FlatBuffers (binary) | Protocol Buffers LITE | Rapid JSON | FlatBuffers (JSON) |
|
||||
|--------------------------------------------------------|-----------------------|-----------------------|-----------------------|-----------------------|
|
||||
| Decode + Traverse + Dealloc (1 million times, seconds) | 0.08 | 305 | 583 | 105 |
|
||||
| Decode / Traverse / Dealloc (breakdown) | 0 / 0.08 / 0 | 220 / 3.6 / 81 | 294 / 0.9 / 287 | 70 / 0.08 / 35 |
|
||||
| Encode (1 million times, seconds) | 3.2 | 185 | 650 | 169 |
|
||||
| Wire format size (normal / zlib, bytes) | 344 / 220 | 228 / 174 | 1475 / 322 | 1029 / 298 |
|
||||
| Memory needed to store decoded wire (bytes / blocks) | 0 / 0 | 760 / 20 | 65689 / 40 | 328 / 1 |
|
||||
| Transient memory allocated during decode (KB) | 0 | 1 | 131 | 4 |
|
||||
| Generated source code size (KB) | 4 | 61 | 0 | 4 |
|
||||
| Field access in handwritten traversal code | accessors | accessors | manual error checking | accessors |
|
||||
| Library source code (KB) | 15 | some subset of 3800 | 87 | 43 |
|
||||
|
||||
### Some other serialization systems we compared against but did not benchmark (yet), in rough order of applicability:
|
||||
|
||||
- Cap'n'Proto promises to reduce Protocol Buffers much like FlatBuffers does,
|
||||
though with a more complicated binary encoding and less flexibility (no
|
||||
optional fields to allow deprecating fields or serializing with missing
|
||||
fields for which defaults exist).
|
||||
It currently also isn't fully cross-platform portable (lack of VS support).
|
||||
- msgpack: has very minimal forwards/backwards compatability support when used
|
||||
with the typed C++ interface. Also lacks VS2010 support.
|
||||
- Thrift: very similar to Protocol Buffers, but appears to be less efficient,
|
||||
and have more dependencies.
|
||||
- XML: typically even slower than JSON, but has the advantage that it can be
|
||||
parsed with a schema to reduce error-checking boilerplate code.
|
||||
- YAML: a superset of JSON and otherwise very similar. Used by e.g. Unity.
|
||||
- C# comes with built-in serialization functionality, as used by Unity also.
|
||||
Being tied to the language, and having no automatic versioning support
|
||||
limits its applicability.
|
||||
- Project Anarchy (the free mobile engine by Havok) comes with a serialization
|
||||
system, that however does no automatic versioning (have to code around new
|
||||
fields manually), is very much tied to the rest of the engine, and works
|
||||
without a schema to generate code (tied to your C++ class definition).
|
||||
|
||||
45
docs/source/Building.md
Executable file
45
docs/source/Building.md
Executable file
@@ -0,0 +1,45 @@
|
||||
# Building
|
||||
|
||||
The system comes with a `cmake` file that should allow you to build the
|
||||
compiler `flatc` and the tests (optionally). For details on `cmake`, see
|
||||
<http://www.cmake.org>. In brief, depending on your platform, use one of
|
||||
e.g.:
|
||||
|
||||
cmake -G "Unix Makefiles"
|
||||
cmake -G "Visual Studio 10"
|
||||
cmake -G "Xcode"
|
||||
|
||||
Then, build as normal for your platform. This should result in a `flatc`
|
||||
executable, essential for the next steps.
|
||||
Note that to use clang instead of gcc, you may need to set up your environment
|
||||
variables, e.g.
|
||||
`CC=/usr/bin/clang CXX=/usr/bin/clang++ cmake -G "Unix Makefiles"`.
|
||||
|
||||
Optionally, run the `flattests` executable.
|
||||
to ensure everything is working correctly on your system. If this fails,
|
||||
please contact us!
|
||||
|
||||
The cmake file will also build two sample executables, `sample_binary` and
|
||||
`sample_text`, see the corresponding `.cpp` file in the samples directory.
|
||||
|
||||
There is an `android` directory that contains all you need to build the test
|
||||
executable on android (use the included `build_apk.sh` script, or use
|
||||
`ndk_build` / `adb` etc. as usual). Upon running, it will output to the log
|
||||
if tests succeeded or not.
|
||||
|
||||
There is usually no runtime to compile, as the code consists of a single
|
||||
header, `include/flatbuffers/flatbuffers.h`. You should add the
|
||||
`include` folder to your include paths. If you wish to be
|
||||
able to load schemas and/or parse text into binary buffers at runtime,
|
||||
you additionally need the other headers in `include/flatbuffers`. You must
|
||||
also compile/link `src/idl_parser.cpp` (and `src/idl_gen_text.cpp` if you
|
||||
also want to be able convert binary to text).
|
||||
|
||||
For applications on Google Play that integrate this library, usage is tracked.
|
||||
This tracking is done automatically using the embedded version string
|
||||
(flatbuffer_version_string), and helps us continue to optimize it.
|
||||
Aside from consuming a few extra bytes in your application binary, it shouldn't
|
||||
affect your application at all. We use this information to let us know if
|
||||
FlatBuffers is useful and if we should continue to invest in it. Since this is
|
||||
open source, you are free to remove the version string but we would appreciate
|
||||
if you would leave it in.
|
||||
22
docs/source/Compiler.md
Executable file
22
docs/source/Compiler.md
Executable file
@@ -0,0 +1,22 @@
|
||||
# Using the schema compiler
|
||||
|
||||
Usage:
|
||||
|
||||
flatc [ -c ] [ -j ] [ -b ] [ -t ] file1 file2 ..
|
||||
|
||||
The files are read and parsed in order, and can contain either schemas
|
||||
or data (see below). Later files can make use of definitions in earlier
|
||||
files. Depending on the flags passed, additional files may
|
||||
be generated for each file processed:
|
||||
|
||||
- `-c` : Generate a C++ header for all definitions in this file (as
|
||||
`filename_generated.h`). Skips data.
|
||||
|
||||
- `-j` : Generate Java classes.
|
||||
|
||||
- `-b` : If data is contained in this file, generate a
|
||||
`filename_wire.bin` containing the binary flatbuffer.
|
||||
|
||||
- `-t` : If data is contained in this file, generate a
|
||||
`filename_wire.txt` (for debugging).
|
||||
|
||||
226
docs/source/CppUsage.md
Executable file
226
docs/source/CppUsage.md
Executable file
@@ -0,0 +1,226 @@
|
||||
# Use in C++
|
||||
|
||||
Assuming you have written a schema using the above language in say
|
||||
`mygame.fbs` (FlatBuffer Schema, though the extension doesn't matter),
|
||||
you've generated a C++ header called `mygame_generated.h` using the
|
||||
compiler (e.g. `flatc -c mygame.fbs`), you can now start using this in
|
||||
your program by including the header. As noted, this header relies on
|
||||
`flatbuffers/flatbuffers.h`, which should be in your include path.
|
||||
|
||||
### Writing in C++
|
||||
|
||||
To start creating a buffer, create an instance of `FlatBufferBuilder`
|
||||
which will contain the buffer as it grows:
|
||||
|
||||
FlatBufferBuilder fbb;
|
||||
|
||||
Before we serialize a Monster, we need to first serialize any objects
|
||||
that are contained there-in, i.e. we serialize the data tree using
|
||||
depth first, pre-order traversal. This is generally easy to do on
|
||||
any tree structures. For example:
|
||||
|
||||
auto name = fbb.CreateString("MyMonster");
|
||||
|
||||
unsigned char inv[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
|
||||
auto inventory = fbb.CreateVector(inv, 10);
|
||||
|
||||
`CreateString` and `CreateVector` serialize these two built-in
|
||||
datatypes, and return offsets into the serialized data indicating where
|
||||
they are stored, such that `Monster` below can refer to them.
|
||||
|
||||
`CreateString` can also take an `std::string`, or a `const char *` with
|
||||
an explicit length, and is suitable for holding UTF-8 and binary
|
||||
data if needed.
|
||||
|
||||
`CreateVector` can also take an `std::vector`. The
|
||||
offset it returns is typed, i.e. can only be used to set fields of the
|
||||
correct type below. To create a vector of struct objects (which will
|
||||
be stored as contiguous memory in the buffer, use `CreateVectorOfStructs`
|
||||
instead.
|
||||
|
||||
Vec3 vec(1, 2, 3);
|
||||
|
||||
`Vec3` is the first example of code from our generated
|
||||
header. Structs (unlike tables) translate to simple structs in C++, so
|
||||
we can construct them in a familiar way.
|
||||
|
||||
We have now serialized the non-scalar components of of the monster
|
||||
example, so we could create the monster something like this:
|
||||
|
||||
auto mloc = CreateMonster(fbb, &vec, 150, 80, name, inventory, Color_Red, Offset<void>(0), Any_NONE);
|
||||
|
||||
Note that we're passing `150` for the `mana` field, which happens to be the
|
||||
default value: this means the field will not actually be written to the buffer,
|
||||
since we'll get that value anyway when we query it. This is a nice space
|
||||
savings, since it is very common for fields to be at their default. It means
|
||||
we also don't need to be scared to add fields only used in a minority of cases,
|
||||
since they won't bloat up the buffer sizes if they're not actually used.
|
||||
|
||||
We do something similarly for the union field `test` by specifying a `0` offset
|
||||
and the `NONE` enum value (part of every union) to indicate we don't actually
|
||||
want to write this field.
|
||||
|
||||
Tables (like `Monster`) give you full flexibility on what fields you write
|
||||
(unlike `Vec3`, which always has all fields set because it is a `struct`).
|
||||
If you want even more control over this (i.e. skip fields even when they are
|
||||
not default), instead of the convenient `CreateMonster` call we can also
|
||||
build the object field-by-field manually:
|
||||
|
||||
MonsterBuilder mb(fbb);
|
||||
mb.add_pos(&vec);
|
||||
mb.add_hp(80);
|
||||
mb.add_name(name);
|
||||
mb.add_inventory(inventory);
|
||||
auto mloc = mb.Finish();
|
||||
|
||||
We start with a temporary helper class `MonsterBuilder` (which is
|
||||
defined in our generated code also), then call the various `add_`
|
||||
methods to set fields, and `Finish` to complete the object. This is
|
||||
pretty much the same code as you find inside `CreateMonster`, except
|
||||
we're leaving out a few fields. Fields may also be added in any order,
|
||||
though orderings with fields of the same size adjacent
|
||||
to each other most efficient in size, due to alignment. You should
|
||||
not nest these Builder classes (serialize your
|
||||
data in pre-order).
|
||||
|
||||
Regardless of whether you used `CreateMonster` or `MonsterBuilder`, you
|
||||
now have an offset to the root of your data, and you can finish the
|
||||
buffer using:
|
||||
|
||||
fbb.Finish(mloc);
|
||||
|
||||
The buffer is now ready to be stored somewhere, sent over the network,
|
||||
be compressed, or whatever you'd like to do with it. You can access the
|
||||
start of the buffer with `fbb.GetBufferPointer()`, and it's size from
|
||||
`fbb.GetSize()`.
|
||||
|
||||
`samples/sample_binary.cpp` is a complete code sample similar to
|
||||
the code above, that also includes the reading code below.
|
||||
|
||||
### Reading in C++
|
||||
|
||||
If you've received a buffer from somewhere (disk, network, etc.) you can
|
||||
directly start traversing it using:
|
||||
|
||||
auto monster = GetMonster(buffer_pointer);
|
||||
|
||||
`monster` is of type `Monster *`, and points to somewhere inside your
|
||||
buffer. If you look in your generated header, you'll see it has
|
||||
convenient accessors for all fields, e.g.
|
||||
|
||||
assert(monster->hp() == 80);
|
||||
assert(monster->mana() == 150); // default
|
||||
assert(strcmp(monster->name()->c_str(), "MyMonster") == 0);
|
||||
|
||||
These should all be true. Note that we never stored a `mana` value, so
|
||||
it will return the default.
|
||||
|
||||
To access sub-objects, in this case the `Vec3`:
|
||||
|
||||
auto pos = monster->pos();
|
||||
assert(pos);
|
||||
assert(pos->z() == 3);
|
||||
|
||||
If we had not set the `pos` field during serialization, it would be
|
||||
`NULL`.
|
||||
|
||||
Similarly, we can access elements of the inventory array:
|
||||
|
||||
auto inv = monster->inventory();
|
||||
assert(inv);
|
||||
assert(inv->Get(9) == 9);
|
||||
|
||||
### Direct memory access
|
||||
|
||||
As you can see from the above examples, all elements in a buffer are
|
||||
accessed through generated accessors. This is because everything is
|
||||
stored in little endian format on all platforms (the accessor
|
||||
performs a swap operation on big endian machines), and also because
|
||||
the layout of things is generally not known to the user.
|
||||
|
||||
For structs, layout is deterministic and guaranteed to be the same
|
||||
accross platforms (scalars are aligned to their
|
||||
own size, and structs themselves to their largest member), and you
|
||||
are allowed to access this memory directly by using `sizeof()` and
|
||||
`memcpy` on the pointer to a struct, or even an array of structs.
|
||||
|
||||
To compute offsets to sub-elements of a struct, make sure they
|
||||
are a structs themselves, as then you can use the pointers to
|
||||
figure out the offset without having to hardcode it. This is
|
||||
handy for use of arrays of structs with calls like `glVertexAttribPointer`
|
||||
in OpenGL or similar APIs.
|
||||
|
||||
It is important to note is that structs are still little endian on all
|
||||
machines, so only use tricks like this if you can guarantee you're not
|
||||
shipping on a big endian machine (an `assert(FLATBUFFERS_LITTLEENDIAN)`
|
||||
would be wise).
|
||||
|
||||
## Text & schema parsing
|
||||
|
||||
Using binary buffers with the generated header provides a super low
|
||||
overhead use of FlatBuffer data. There are, however, times when you want
|
||||
to use text formats, for example because it interacts better with source
|
||||
control, or you want to give your users easy access to data.
|
||||
|
||||
Another reason might be that you already have a lot of data in JSON
|
||||
format, or a tool that generates JSON, and if you can write a schema for
|
||||
it, this will provide you an easy way to use that data directly.
|
||||
|
||||
There are two ways to use text formats:
|
||||
|
||||
### Using the compiler as a conversion tool
|
||||
|
||||
This is the preferred path, as it doesn't require you to add any new
|
||||
code to your program, and is maximally efficient since you can ship with
|
||||
binary data. The disadvantage is that it is an extra step for your
|
||||
users/developers to perform, though you might be able to automate it.
|
||||
|
||||
flatc -b myschema.fbs mydata.json
|
||||
|
||||
This will generate the binary file `mydata_wire.bin` which can be loaded
|
||||
as before.
|
||||
|
||||
### Making your program capable of loading text directly
|
||||
|
||||
This gives you maximum flexibility. You could even opt to support both,
|
||||
i.e. check for both files, and regenerate the binary from text when
|
||||
required, otherwise just load the binary.
|
||||
|
||||
This option is currently only available for C++, or Java through JNI.
|
||||
|
||||
As mentioned in the section "Building" above, this technique requires
|
||||
you to link a few more files into your program, and you'll want to include
|
||||
`flatbuffers/idl.h`.
|
||||
|
||||
Load text (either a schema or json) into an in-memory buffer (there is a
|
||||
convenient `LoadFile()` utility function in `flatbuffers/util.h` if you
|
||||
wish). Construct a parser:
|
||||
|
||||
flatbuffers::Parser parser;
|
||||
|
||||
Now you can parse any number of text files in sequence:
|
||||
|
||||
parser.Parse(text_file.c_str());
|
||||
|
||||
This works similarly to how the command-line compiler works: a sequence
|
||||
of files parsed by the same `Parser` object allow later files to
|
||||
reference definitions in earlier files. Typically this means you first
|
||||
load a schema file (which populates `Parser` with definitions), followed
|
||||
by one or more JSON files.
|
||||
|
||||
If there were any parsing errors, `Parse` will return `false`, and
|
||||
`Parser::err` contains a human readable error string with a line number
|
||||
etc, which you should present to the creator of that file.
|
||||
|
||||
After each JSON file, the `Parser::fbb` member variable is the
|
||||
`FlatBufferBuilder` that contains the binary buffer version of that
|
||||
file, that you can access as described above.
|
||||
|
||||
`samples/sample_text.cpp` is a code sample showing the above operations.
|
||||
|
||||
### Threading
|
||||
|
||||
None of the code is thread-safe, by design. That said, since currently a
|
||||
FlatBuffer is read-only and entirely `const`, reading by multiple threads
|
||||
is possible.
|
||||
|
||||
126
docs/source/FlatBuffers.md
Normal file
126
docs/source/FlatBuffers.md
Normal file
@@ -0,0 +1,126 @@
|
||||
# FlatBuffers
|
||||
|
||||
FlatBuffers is an efficient cross platform serialization library in for C++ and
|
||||
Java. It was created at Google specifically for game development and other
|
||||
performance-critical applications.
|
||||
|
||||
It is available as open source under the Apache license, v2 (see LICENSE.txt).
|
||||
|
||||
## Why use FlatBuffers?
|
||||
|
||||
- **Access to serialized data without parsing/unpacking** - What sets
|
||||
FlatBuffers apart is that it represents hierarchical data in a flat
|
||||
binary buffer in such a way that it can still be accessed directly
|
||||
without parsing/unpacking, while also still supporting data
|
||||
structure evolution (forwards/backwards compatibility).
|
||||
|
||||
- **Memory efficiency and speed** - The only memory needed to access
|
||||
your data is that of the buffer. It requires 0 additional allocations.
|
||||
FlatBuffers is also very
|
||||
suitable for use with mmap (or streaming), requiring only part of the
|
||||
buffer to be in memory. Access is close to the speed of raw
|
||||
struct access with only one extra indirection (a kind of vtable) to
|
||||
allow for format evolution and optional fields. It is aimed at
|
||||
projects where spending time and space (many memory allocations) to
|
||||
be able to access or construct serialized data is undesirable, such
|
||||
as in games or any other performance sensitive applications. See the
|
||||
[benchmarks](md__benchmarks.html) for details.
|
||||
|
||||
- **Flexible** - Optional fields means not only do you get great
|
||||
forwards and backwards compatibility (increasingly important for
|
||||
long-lived games: don't have to update all data with each new
|
||||
version!). It also means you have a lot of choice in what data you
|
||||
write and what data you don't, and how you design data structures.
|
||||
|
||||
- **Tiny code footprint** - Small amounts of generated code, and just
|
||||
a single small header as the minimum dependency, which is very easy
|
||||
to integrate. Again, see the benchmark section for details.
|
||||
|
||||
- **Strongly typed** - Errors happen at compile time rather than
|
||||
manually having to write repetitive and error prone run-time checks.
|
||||
Useful code can be generated for you.
|
||||
|
||||
- **Convenient to use** - Generated C++ code allows for terse access
|
||||
& construction code. Then there's optional functionality for parsing
|
||||
schemas and JSON-like text representations at runtime efficiently if
|
||||
needed (faster and more memory efficient than other JSON
|
||||
parsers).
|
||||
|
||||
Java code supports object-reuse.
|
||||
|
||||
- **Cross platform C++11/Java code with no dependencies** - will work with
|
||||
any recent gcc/clang and VS2010. Comes with build files for the tests &
|
||||
samples (Android .mk files, and cmake for all other platforms).
|
||||
|
||||
### Why not use Protocol Buffers, or .. ?
|
||||
|
||||
Protocol Buffers is indeed relatively similar to FlatBuffers,
|
||||
with the primary difference being that FlatBuffers does not need a parsing/
|
||||
unpacking step to a secondary representation before you can
|
||||
access data, often coupled with per-object memory allocation. The code
|
||||
is an order of magnitude bigger, too. Protocol Buffers has neither optional
|
||||
text import/export nor schema language features like unions.
|
||||
|
||||
### But all the cool kids use JSON!
|
||||
|
||||
JSON is very readable (which is why we use it as our optional text
|
||||
format) and very convenient when used together with dynamically typed
|
||||
languages (such as JavaScript). When serializing data from statically
|
||||
typed languages, however, JSON not only has the obvious drawback of runtime
|
||||
inefficiency, but also forces you to write *more* code to access data
|
||||
(counterintuitively) due to its dynamic-typing serialization system.
|
||||
In this context, it is only a better choice for systems that have very
|
||||
little to no information ahead of time about what data needs to be stored.
|
||||
|
||||
Read more about the "why" of FlatBuffers in the
|
||||
[white paper](md__white_paper.html).
|
||||
|
||||
## Usage in brief
|
||||
|
||||
This section is a quick rundown of how to use this system. Subsequent
|
||||
sections provide a more in-depth usage guide.
|
||||
|
||||
- Write a schema file that allows you to define the data structures
|
||||
you may want to serialize. Fields can have a scalar type
|
||||
(ints/floats of all sizes), or they can be a: string; array of any type;
|
||||
reference to yet another object; or, a set of possible objects (unions).
|
||||
Fields are optional and have defaults, so they don't need to be
|
||||
present for every object instance.
|
||||
|
||||
- Use `flatc` (the FlatBuffer compiler) to generate a C++ header (or Java
|
||||
classes) with helper classes to access and construct serialized data. This
|
||||
header (say `mydata_generated.h`) only depends on `flatbuffers.h`, which
|
||||
defines the core functionality.
|
||||
|
||||
- Use the `FlatBufferBuilder` class to construct a flat binary buffer.
|
||||
The generated functions allow you to add objects to this
|
||||
buffer recursively, often as simply as making a single function call.
|
||||
|
||||
- Store or send your buffer somewhere!
|
||||
|
||||
- When reading it back, you can obtain the pointer to the root object
|
||||
from the binary buffer, and from there traverse it conveniently
|
||||
in-place with `object->field()`.
|
||||
|
||||
## In-depth documentation
|
||||
|
||||
- How to [build the compiler](md__building.html) and samples on various
|
||||
platforms.
|
||||
- How to [use the compiler](md__compiler.html).
|
||||
- How to [write a schema](md__schemas.html).
|
||||
- How to [use the generated C++ code](md__cpp_usage.html) in your own
|
||||
programs.
|
||||
- How to [use the generated Java code](md__java_usage.html) in your own
|
||||
programs.
|
||||
- Some [benchmarks](md__benchmarks.html) showing the advantage of using
|
||||
FlatBuffers.
|
||||
- A [white paper](md__white_paper.html) explaining the "why" of FlatBuffers.
|
||||
- A description of the [internals](md__internals.html) of FlatBuffers.
|
||||
- A formal [grammar](md__grammar.html) of the schema language.
|
||||
|
||||
## Online resources
|
||||
|
||||
- [github repository](http://github.com/google/flatbuffers)
|
||||
- [landing page](http://google.github.io/flatbuffers)
|
||||
- [FlatBuffers Google Group](http://group.google.com/group/flatbuffers)
|
||||
- [FlatBuffers Issues Tracker](http://github.com/google/flatbuffers/issues)
|
||||
30
docs/source/Grammar.md
Executable file
30
docs/source/Grammar.md
Executable file
@@ -0,0 +1,30 @@
|
||||
# Formal Grammar of the schema language
|
||||
|
||||
schema = namespace\_decl | type\_decl | enum\_decl | root\_decl | object
|
||||
|
||||
namespace\_decl = `namespace` ident ( `.` ident )* `;`
|
||||
|
||||
type\_decl = ( `table` | `struct` ) ident metadata `{` field\_decl+ `}`
|
||||
|
||||
enum\_decl = ( `enum` | `union` ) ident [ `:` type ] metadata `{` commasep(
|
||||
enumval\_decl ) `}`
|
||||
|
||||
root\_decl = `root_type` ident `;`
|
||||
|
||||
field\_decl = type `:` ident [ `=` scalar ] metadata `;`
|
||||
|
||||
type = `bool` | `byte` | `ubyte` | `short` | `ushort` | `int` | `uint` |
|
||||
`float` | `long` | `ulong` | `double`
|
||||
| `string` | `[` type `]` | ident
|
||||
|
||||
enumval\_decl = ident [ `=` integer\_constant ]
|
||||
|
||||
metadata = [ `(` commasep( ident [ `:` scalar ] ) `)` ]
|
||||
|
||||
scalar = integer\_constant | float\_constant | `true` | `false`
|
||||
|
||||
object = { commasep( ident `:` value ) }
|
||||
|
||||
value = scalar | object | string\_constant | `[` commasep( value ) `]`
|
||||
|
||||
commasep(x) = [ x ( `,` x )\* ]
|
||||
244
docs/source/Internals.md
Executable file
244
docs/source/Internals.md
Executable file
@@ -0,0 +1,244 @@
|
||||
# FlatBuffer Internals
|
||||
|
||||
This section is entirely optional for the use of FlatBuffers. In normal
|
||||
usage, you should never need the information contained herein. If you're
|
||||
interested however, it should give you more of an appreciation of why
|
||||
FlatBuffers is both efficient and convenient.
|
||||
|
||||
### Format components
|
||||
|
||||
A FlatBuffer is a binary file and in-memory format consisting mostly of
|
||||
scalars of various sizes, all aligned to their own size. Each scalar is
|
||||
also always represented in little-endian format, as this corresponds to
|
||||
all commonly used CPUs today. FlatBuffers will also work on big-endian
|
||||
machines, but will be slightly slower because of additional
|
||||
byte-swap intrinsics.
|
||||
|
||||
On purpose, the format leaves a lot of details about where exactly
|
||||
things live in memory undefined, e.g. fields in a table can have any
|
||||
order, and objects to some extend can be stored in many orders. This is
|
||||
because the format doesn't need this information to be efficient, and it
|
||||
leaves room for optimization and extension (for example, fields can be
|
||||
packed in a way that is most compact). Instead, the format is defined in
|
||||
terms of offsets and adjacency only.
|
||||
|
||||
### Format identification
|
||||
|
||||
The format also doesn't contain information for format identification
|
||||
and versioning, which is also by design. FlatBuffers is a statically typed
|
||||
system, meaning the user of a buffer needs to know what kind of buffer
|
||||
it is. FlatBuffers can of course be wrapped inside other containers
|
||||
where needed, or you can use its union feature to dynamically identify
|
||||
multiple possible sub-objects stored. Additionally, it can be used
|
||||
together with the schema parser if full reflective capabilities are
|
||||
desired.
|
||||
|
||||
Versioning is something that is intrinsically part of the format (the
|
||||
optionality / extensibility of fields), so the format itself does not
|
||||
need a version number (it's a meta-format, in a sense). We're hoping
|
||||
that this format can accommodate all data needed. If format breaking
|
||||
changes are ever necessary, it would become a new kind of format rather
|
||||
than just a variation.
|
||||
|
||||
### Offsets
|
||||
|
||||
The most important and generic offset type (see `flatbuffers.h`) is
|
||||
`offset_t`, which is currently always a `uint32_t`, and is used to
|
||||
refer to all tables/unions/strings/vectors. 32bit is
|
||||
intentional, since we want to keep the format binary compatible between
|
||||
32 and 64bit systems, and a 64bit offset would bloat the size for almost
|
||||
all uses. A version of this format with 64bit (or 16bit) offsets is easy to set
|
||||
when needed. Unsigned means they can only point in one direction, which
|
||||
typically is forward (towards a higher memory location). Any backwards
|
||||
offsets will be explicitly marked as such.
|
||||
|
||||
The format starts with an `offset_t` to the root object in the buffer.
|
||||
|
||||
We have two kinds of objects, structs and tables.
|
||||
|
||||
### Structs
|
||||
|
||||
These are the simplest, and as mentioned, intended for simple data that
|
||||
benefits from being extra efficient and doesn't need versioning /
|
||||
extensibility. They are always stored inline in their parent (a struct,
|
||||
table, or vector) for maximum compactness. Structs define a consistent
|
||||
memory layout where all components are aligned to their size, and
|
||||
structs aligned to their largest scalar member. This is done independent
|
||||
of the alignment rules of the underlying compiler to guarantee a cross
|
||||
platform compatible layout. This layout is then enforced in the generated
|
||||
code.
|
||||
|
||||
### Tables
|
||||
|
||||
These start with an `soffset_t` to a vtable (signed version of
|
||||
`offset_t`, since vtables may be stored anywhere), followed by all the
|
||||
fields as aligned scalars. Unlike structs, not all fields need to be
|
||||
present. There is no set order and layout.
|
||||
|
||||
To be able to access fields regardless of these uncertainties, we go
|
||||
through a vtable of offsets. Vtables are shared between any objects that
|
||||
happen to have the same vtable values.
|
||||
|
||||
The elements of a vtable are all of type `voffset_t`, which is currently
|
||||
a `uint16_t`. The first element is the number of elements of the vtable,
|
||||
including this one. The second one is the size of the object, in bytes
|
||||
(including the vtable offset). This size is used for streaming, to know
|
||||
how many bytes to read to be able to access all fields of the object.
|
||||
The remaining elements are N the offsets, where N is the amount of field
|
||||
declared in the schema when the code that constructed this buffer was
|
||||
compiled (thus, the size of the table is N + 2).
|
||||
|
||||
All accessor functions in the generated code for tables contain the
|
||||
offset into this table as a constant. This offset is checked against the
|
||||
first field (the number of elements), to protect against newer code
|
||||
reading older data. If this offset is out of range, or the vtable entry
|
||||
is 0, that means the field is not present in this object, and the
|
||||
default value is return. Otherwise, the entry is used as offset to the
|
||||
field to be read.
|
||||
|
||||
### Strings and Vectors
|
||||
|
||||
Strings are simply a vector of bytes, and are always
|
||||
null-terminated. Vectors are stored as contiguous aligned scalar
|
||||
elements prefixed by a count.
|
||||
|
||||
### Construction
|
||||
|
||||
The current implementation constructs these buffers backwards, since
|
||||
that significantly reduces the amount of bookkeeping and simplifies the
|
||||
construction API.
|
||||
|
||||
### Code example
|
||||
|
||||
Here's an example of the code that gets generated for the `samples/monster.fbs`.
|
||||
What follows is the entire file, broken up by comments:
|
||||
|
||||
// automatically generated, do not modify
|
||||
|
||||
#include "flatbuffers/flatbuffers.h"
|
||||
|
||||
namespace MyGame {
|
||||
namespace Sample {
|
||||
|
||||
Nested namespace support.
|
||||
|
||||
enum {
|
||||
Color_Red = 0,
|
||||
Color_Green = 1,
|
||||
Color_Blue = 2,
|
||||
};
|
||||
|
||||
inline const char **EnumNamesColor() {
|
||||
static const char *names[] = { "Red", "Green", "Blue", nullptr };
|
||||
return names;
|
||||
}
|
||||
|
||||
inline const char *EnumNameColor(int e) { return EnumNamesColor()[e]; }
|
||||
|
||||
Enums and convenient reverse lookup.
|
||||
|
||||
enum {
|
||||
Any_NONE = 0,
|
||||
Any_Monster = 1,
|
||||
};
|
||||
|
||||
inline const char **EnumNamesAny() {
|
||||
static const char *names[] = { "NONE", "Monster", nullptr };
|
||||
return names;
|
||||
}
|
||||
|
||||
inline const char *EnumNameAny(int e) { return EnumNamesAny()[e]; }
|
||||
|
||||
Unions share a lot with enums.
|
||||
|
||||
struct Vec3;
|
||||
struct Monster;
|
||||
|
||||
Predeclare all datatypes since there may be circular references.
|
||||
|
||||
MANUALLY_ALIGNED_STRUCT(4) Vec3 {
|
||||
private:
|
||||
float x_;
|
||||
float y_;
|
||||
float z_;
|
||||
|
||||
public:
|
||||
Vec3(float x, float y, float z)
|
||||
: x_(flatbuffers::EndianScalar(x)), y_(flatbuffers::EndianScalar(y)), z_(flatbuffers::EndianScalar(z)) {}
|
||||
|
||||
float x() const { return flatbuffers::EndianScalar(x_); }
|
||||
float y() const { return flatbuffers::EndianScalar(y_); }
|
||||
float z() const { return flatbuffers::EndianScalar(z_); }
|
||||
};
|
||||
STRUCT_END(Vec3, 12);
|
||||
|
||||
These ugly macros do a couple of things: they turn off any padding the compiler
|
||||
might normally do, since we add padding manually (though none in this example),
|
||||
and they enforce alignment chosen by FlatBuffers. This ensures the layout of
|
||||
this struct will look the same regardless of compiler and platform. Note that
|
||||
the fields are private: this is because these store little endian scalars
|
||||
regardless of platform (since this is part of the serialized data).
|
||||
`EndianScalar` then converts back and forth, which is a no-op on all current
|
||||
mobile and desktop platforms, and a single machine instruction on the few
|
||||
remaining big endian platforms.
|
||||
|
||||
struct Monster : private flatbuffers::Table {
|
||||
const Vec3 *pos() const { return GetStruct<const Vec3 *>(4); }
|
||||
int16_t mana() const { return GetField<int16_t>(6, 150); }
|
||||
int16_t hp() const { return GetField<int16_t>(8, 100); }
|
||||
const flatbuffers::String *name() const { return GetPointer<const flatbuffers::String *>(10); }
|
||||
const flatbuffers::Vector<uint8_t> *inventory() const { return GetPointer<const flatbuffers::Vector<uint8_t> *>(14); }
|
||||
int8_t color() const { return GetField<int8_t>(16, 2); }
|
||||
};
|
||||
|
||||
Tables are a bit more complicated. A table accessor struct is used to point at
|
||||
the serialized data for a table, which always starts with an offset to its
|
||||
vtable. It derives from `Table`, which contains the `GetField` helper functions.
|
||||
GetField takes a vtable offset, and a default value. It will look in the vtable
|
||||
at that offset. If the offset is out of bounds (data from an older version) or
|
||||
the vtable entry is 0, the field is not present and the default is returned.
|
||||
Otherwise, it uses the entry as an offset into the table to locate the field.
|
||||
|
||||
struct MonsterBuilder {
|
||||
flatbuffers::FlatBufferBuilder &fbb_;
|
||||
flatbuffers::uoffset_t start_;
|
||||
void add_pos(const Vec3 *pos) { fbb_.AddStruct(4, pos); }
|
||||
void add_mana(int16_t mana) { fbb_.AddElement<int16_t>(6, mana, 150); }
|
||||
void add_hp(int16_t hp) { fbb_.AddElement<int16_t>(8, hp, 100); }
|
||||
void add_name(flatbuffers::Offset<flatbuffers::String> name) { fbb_.AddOffset(10, name); }
|
||||
void add_inventory(flatbuffers::Offset<flatbuffers::Vector<uint8_t>> inventory) { fbb_.AddOffset(14, inventory); }
|
||||
void add_color(int8_t color) { fbb_.AddElement<int8_t>(16, color, 2); }
|
||||
MonsterBuilder(flatbuffers::FlatBufferBuilder &_fbb) : fbb_(_fbb) { start_ = fbb_.StartTable(); }
|
||||
flatbuffers::Offset<Monster> Finish() { return flatbuffers::Offset<Monster>(fbb_.EndTable(start_, 7)); }
|
||||
};
|
||||
|
||||
`MonsterBuilder` is the base helper struct to construct a table using a
|
||||
`FlatBufferBuilder`. You can add the fields in any order, and the `Finish`
|
||||
call will ensure the correct vtable gets generated.
|
||||
|
||||
inline flatbuffers::Offset<Monster> CreateMonster(flatbuffers::FlatBufferBuilder &_fbb, const Vec3 *pos, int16_t mana, int16_t hp, flatbuffers::Offset<flatbuffers::String> name, flatbuffers::Offset<flatbuffers::Vector<uint8_t>> inventory, int8_t color) {
|
||||
MonsterBuilder builder_(_fbb);
|
||||
builder_.add_inventory(inventory);
|
||||
builder_.add_name(name);
|
||||
builder_.add_pos(pos);
|
||||
builder_.add_hp(hp);
|
||||
builder_.add_mana(mana);
|
||||
builder_.add_color(color);
|
||||
return builder_.Finish();
|
||||
}
|
||||
|
||||
`CreateMonster` is a convenience function that calls all functions in
|
||||
`MonsterBuilder` above for you. Note that if you pass values which are
|
||||
defaults as arguments, it will not actually construct that field, so
|
||||
you can probably use this function instead of the builder class in
|
||||
almost all cases.
|
||||
|
||||
inline const Monster *GetMonster(const void *buf) { return flatbuffers::GetRoot<Monster>(buf); }
|
||||
|
||||
This function is only generated for the root table type, to be able to
|
||||
start traversing a FlatBuffer from a raw buffer pointer.
|
||||
|
||||
}; // namespace MyGame
|
||||
}; // namespace Sample
|
||||
|
||||
|
||||
79
docs/source/JavaUsage.md
Executable file
79
docs/source/JavaUsage.md
Executable file
@@ -0,0 +1,79 @@
|
||||
# Use in Java
|
||||
|
||||
There's experimental support for reading FlatBuffers in Java. Generate code
|
||||
for Java with the `-j` option to `flatc`.
|
||||
|
||||
See `javaTest.java` for an example. Essentially, you read a FlatBuffer binary
|
||||
file into a `byte[]`, which you then turn into a `ByteBuffer`, which you pass to
|
||||
the `getRootAsMonster` function:
|
||||
|
||||
ByteBuffer bb = ByteBuffer.wrap(data);
|
||||
Monster monster = Monster.getRootAsMonster(bb);
|
||||
|
||||
Now you can access values much like C++:
|
||||
|
||||
short hp = monster.hp();
|
||||
Vec3 pos = monster.pos();
|
||||
|
||||
Note that whenever you access a new object like in the `pos` example above,
|
||||
a new temporary accessor object gets created. If your code is very performance
|
||||
sensitive (you iterate through a lot of objects), there's a second `pos()`
|
||||
method to which you can pass a `Vec3` object you've already created. This allows
|
||||
you to reuse it across many calls and reduce the amount of object allocation (and
|
||||
thus garbage collection) your program does.
|
||||
|
||||
Sadly the string accessors currently always create a new string when accessed,
|
||||
since FlatBuffer's UTF-8 strings can't be read in-place by Java.
|
||||
|
||||
Vector access is also a bit different from C++: you pass an extra index
|
||||
to the vector field accessor. Then a second method with the same name
|
||||
suffixed by `_length` let's you know the number of elements you can access:
|
||||
|
||||
for (int i = 0; i < monster.inventory_length(); i++)
|
||||
monster.inventory(i); // do something here
|
||||
|
||||
You can also construct these buffers in Java using the static methods found
|
||||
in the generated code, and the FlatBufferBuilder class:
|
||||
|
||||
FlatBufferBuilder fbb = new FlatBufferBuilder();
|
||||
|
||||
Create strings:
|
||||
|
||||
int str = fbb.createString("MyMonster");
|
||||
|
||||
Create a table with a struct contained therein:
|
||||
|
||||
Monster.startMonster(fbb);
|
||||
Monster.addPos(fbb, Vec3.createVec3(fbb, 1.0f, 2.0f, 3.0f, 3.0, (byte)4, (short)5, (byte)6));
|
||||
Monster.addHp(fbb, (short)80);
|
||||
Monster.addName(fbb, str);
|
||||
Monster.addInventory(fbb, inv);
|
||||
Monster.addTest_type(fbb, (byte)1);
|
||||
Monster.addTest(fbb, mon2);
|
||||
Monster.addTest4(fbb, test4s);
|
||||
int mon = Monster.endMonster(fbb);
|
||||
|
||||
As you can see, the Java code for tables does not use a convenient
|
||||
`createMonster` call like the C++ code. This is to create the buffer without
|
||||
using temporary object allocation (since the `Vec3` is an inline component of
|
||||
`Monster`, it has to be created right where it is added, whereas the name and
|
||||
the inventory are not inline).
|
||||
Structs do have convenient methods that even have arguments for nested structs.
|
||||
|
||||
Vectors also use this start/end pattern to allow vectors of both scalar types
|
||||
and structs:
|
||||
|
||||
Monster.startInventoryVector(fbb, 5);
|
||||
for (byte i = 4; i >=0; i--) fbb.addByte(i);
|
||||
int inv = fbb.endVector();
|
||||
|
||||
You can use the generated method `startInventoryVector` to conveniently call
|
||||
`startVector` with the right element size. You pass the number of
|
||||
elements you want to write. You write the elements backwards since the buffer
|
||||
is being constructed back to front.
|
||||
|
||||
## Text Parsing
|
||||
|
||||
There currently is no support for parsing text (Schema's and JSON) directly
|
||||
from Java, though you could use the C++ parser through JNI. Please see the
|
||||
C++ documentation for more on text parsing.
|
||||
198
docs/source/Schemas.md
Executable file
198
docs/source/Schemas.md
Executable file
@@ -0,0 +1,198 @@
|
||||
# Writing a schema
|
||||
|
||||
The syntax of the schema language (aka IDL, Interface Definition
|
||||
Language) should look quite familiar to users of any of the C family of
|
||||
languages, and also to users of other IDLs. Let's look at an example
|
||||
first:
|
||||
|
||||
// example IDL file
|
||||
|
||||
namespace MyGame;
|
||||
|
||||
enum Color : byte { Red = 1, Green, Blue }
|
||||
|
||||
union Any { Monster, Weapon, Pickup }
|
||||
|
||||
struct Vec3 {
|
||||
x:float;
|
||||
y:float;
|
||||
z:float;
|
||||
}
|
||||
|
||||
table Monster {
|
||||
pos:Vec3;
|
||||
mana:short = 150;
|
||||
hp:short = 100;
|
||||
name:string;
|
||||
friendly:bool = false (deprecated, priority: 1);
|
||||
inventory:[ubyte];
|
||||
color:Color = Blue;
|
||||
test:Any;
|
||||
}
|
||||
|
||||
root_type Monster;
|
||||
|
||||
(Weapon & Pickup not defined as part of this example).
|
||||
|
||||
### Tables
|
||||
|
||||
Tables are the main way of defining objects in FlatBuffers, and consist
|
||||
of a name (here `Monster`) and a list of fields. Each field has a name,
|
||||
a type, and optionally a default value (if omitted, it defaults to 0 /
|
||||
NULL).
|
||||
|
||||
Each field is optional: It does not have to appear in the wire
|
||||
representation, and you can choose to omit fields for each individual
|
||||
object. As a result, you have the flexibility to add fields without fear of
|
||||
bloating your data. This design is also FlatBuffer's mechanism for forward
|
||||
and backwards compatibility. Note that:
|
||||
|
||||
- You can add new fields in the schema ONLY at the end of a table
|
||||
definition. Older data will still
|
||||
read correctly, and give you the default value when read. Older code
|
||||
will simply ignore the new field.
|
||||
|
||||
- You cannot delete fields you don't use anymore from the schema,
|
||||
but you can simply
|
||||
stop writing them into your data for almost the same effect.
|
||||
Additionally you can mark them as `deprecated` as in the example
|
||||
above, which will prevent the generation of accessors in the
|
||||
generated C++, as a way to enforce the field not being used any more.
|
||||
(careful: this may break code!).
|
||||
|
||||
- You may change field names and table names, if you're ok with your
|
||||
code breaking until you've renamed them there too.
|
||||
|
||||
|
||||
|
||||
### Structs
|
||||
|
||||
Similar to a table, only now none of the fields are optional (so no defaults
|
||||
either), and fields may not be added or be deprecated. Structs may only contain
|
||||
scalars or other structs. Use this for
|
||||
simple objects where you are very sure no changes will ever be made
|
||||
(as quite clear in the example `Vec3`). Structs use less memory than
|
||||
tables and are even faster to access (they are always stored in-line in their
|
||||
parent object, and use no virtual table).
|
||||
|
||||
### Types
|
||||
|
||||
Builtin scalar types are:
|
||||
|
||||
- 8 bit: `byte ubyte bool`
|
||||
|
||||
- 16 bit: `short ushort`
|
||||
|
||||
- 32 bit: `int uint float`
|
||||
|
||||
- 64 bit: `long ulong double`
|
||||
|
||||
- Vector of any other type (denoted with `[type]`). Nesting vectors
|
||||
require you wrap the inner vector in a struct/table rather than
|
||||
writing `[[type]]`.
|
||||
|
||||
- `string`, which may only hold UTF-8 or 7-bit ASCII. For other text encodings
|
||||
or general binary data use vectors (`[byte]` or `[ubyte]`) instead.
|
||||
|
||||
- References to other tables or structs, enums or unions (see
|
||||
below).
|
||||
|
||||
You can't change types of fields once they're used, with the exception
|
||||
of same-size data where a `reinterpret_cast` would give you a desirable result,
|
||||
e.g. you could change a `uint` to an `int` if no values in current data use the
|
||||
high bit yet.
|
||||
|
||||
### (Default) Values
|
||||
|
||||
Values are a sequence of digits, optionally followed by a `.` and more digits
|
||||
for float constants, and optionally prefixed by a `-`. Non-scalar defaults are
|
||||
currently not supported (always NULL).
|
||||
|
||||
You generally do not want to change default values after they're initially
|
||||
defined. Fields that have the default value are not actually stored in the
|
||||
serialized data but are generated in code, so when you change the default, you'd
|
||||
now get a different value than from code generated from an older version of
|
||||
the schema. There are situations however where this may be
|
||||
desirable, especially if you can ensure a simultaneous rebuild of
|
||||
all code.
|
||||
|
||||
### Enums
|
||||
|
||||
Define a sequence of named constants, each with a given value, or
|
||||
increasing by one from the previous one. The default first value
|
||||
is `0`. As you can see in the enum declaration, you specify the underlying
|
||||
integral type of the enum with `:` (in this case `byte`), which then determines
|
||||
the type of any fields declared with this enum type. If you omit the underlying
|
||||
type, it will be `short`.
|
||||
|
||||
### Unions
|
||||
|
||||
Unions share a lot of properties with enums, but instead of new names
|
||||
for constants, you use names of tables. You can then declare
|
||||
a union field which can hold a reference to any of those types, and
|
||||
additionally a hidden field with the suffix `_type` is generated that
|
||||
holds the corresponding enum value, allowing you to know which type to
|
||||
cast to at runtime.
|
||||
|
||||
### Namespaces
|
||||
|
||||
These will generate the corresponding namespace in C++ for all helper
|
||||
code, and packages in Java. You can use `.` to specify nested namespaces /
|
||||
packages.
|
||||
|
||||
### Root type
|
||||
|
||||
This declares what you consider to be the root table (or struct) of the
|
||||
serialized data.
|
||||
|
||||
### Comments & documentation
|
||||
|
||||
May be written as in most C-based languages. Additionally, a triple
|
||||
comment (`///`) on a line by itself signals that a comment is documentation
|
||||
for whatever is declared on the line after it
|
||||
(table/struct/field/enum/union/element), and the comment is output
|
||||
in the corresponding C++ code. Multiple such lines per item are allowed.
|
||||
|
||||
### Attributes
|
||||
|
||||
Attributes may be attached to a declaration, behind a field, or after
|
||||
the name of a table/struct/enum/union. These may either have a value or
|
||||
not. Some attributes like `deprecated` are understood by the compiler,
|
||||
others are simply ignored (like `priority`), but are available to query
|
||||
if you parse the schema at runtime.
|
||||
This is useful if you write your own code generators/editors etc., and
|
||||
you wish to add additional information specific to your tool (such as a
|
||||
help text).
|
||||
|
||||
Current understood attributes:
|
||||
|
||||
- `deprecated` (on a field): do not generate accessors for this field
|
||||
anymore, code should stop using this data.
|
||||
- `original_order` (on a table): since elements in a table do not need
|
||||
to be stored in any particular order, they are often optimized for
|
||||
space by sorting them to size. This attribute stops that from happening.
|
||||
- `force_align: size` (on a struct): force the alignment of this struct
|
||||
to be something higher than what it is naturally aligned to. Causes
|
||||
these structs to be aligned to that amount inside a buffer, IF that
|
||||
buffer is allocated with that alignment (which is not necessarily
|
||||
the case for buffers accessed directly inside a `FlatBufferBuilder`).
|
||||
|
||||
## Gotchas
|
||||
|
||||
### Schemas and version control
|
||||
|
||||
FlatBuffers relies on new field declarations being added at the end, and earlier
|
||||
declarations to not be removed, but be marked deprecated when needed. We think
|
||||
this is an improvement over the manual number assignment that happens in
|
||||
Protocol Buffers.
|
||||
|
||||
One place where this is possibly problematic however is source control. If user
|
||||
A adds a field, generates new binary data with this new schema, then tries to
|
||||
commit both to source control after user B already committed a new field also,
|
||||
and just auto-merges the schema, the binary files are now invalid compared to
|
||||
the new schema.
|
||||
|
||||
The solution of course is that you should not be generating binary data before
|
||||
your schema changes have been committed, ensuring consistency with the rest of
|
||||
the world.
|
||||
|
||||
127
docs/source/WhitePaper.md
Executable file
127
docs/source/WhitePaper.md
Executable file
@@ -0,0 +1,127 @@
|
||||
# FlatBuffers white paper
|
||||
|
||||
This document tries to shed some light on to the "why" of FlatBuffers, a
|
||||
new serialization library.
|
||||
|
||||
## Motivation
|
||||
|
||||
Back in the good old days, performance was all about instructions and
|
||||
cycles. Nowadays, processing units have run so far ahead of the memory
|
||||
subsystem, that making an efficient application should start and finish
|
||||
with thinking about memory. How much you use of it. How you lay it out
|
||||
and access it. How you allocate it. When you copy it.
|
||||
|
||||
Serialization is a pervasive activity in a lot programs, and a common
|
||||
source of memory inefficiency, with lots of temporary data structures
|
||||
needed to parse and represent data, and inefficient allocation patterns
|
||||
and locality.
|
||||
|
||||
If it would be possible to do serialization with no temporary objects,
|
||||
no additional allocation, no copying, and good locality, this could be
|
||||
of great value. The reason serialization systems usually don't manage
|
||||
this is because it goes counter to forwards/backwards compatability, and
|
||||
platform specifics like endianness and alignment.
|
||||
|
||||
FlatBuffers is what you get if you try anyway.
|
||||
|
||||
In particular, FlatBuffers focus is on mobile hardware (where memory
|
||||
size and memory bandwidth is even more constrained than on desktop
|
||||
hardware), and applications that have the highest performance needs:
|
||||
games.
|
||||
|
||||
## FlatBuffers
|
||||
|
||||
*This is a summary of FlatBuffers functionality, with some rationale.
|
||||
A more detailed description can be found in the FlatBuffers
|
||||
documentation.*
|
||||
|
||||
### Summary
|
||||
|
||||
A FlatBuffer is a binary buffer containing nested objects (structs,
|
||||
tables, vectors,..) organized using offsets so that the data can be
|
||||
traversed in-place just like any pointer-based data structure. Unlike
|
||||
most in-memory data structures however, it uses strict rules of
|
||||
alignment and endianness (always little) to ensure these buffers are
|
||||
cross platform. Additionally, for objects that are tables, FlatBuffers
|
||||
provides forwards/backwards compatibility and general optionality of
|
||||
fields, to support most forms of format evolution.
|
||||
|
||||
You define your object types in a schema, which can then be compiled to
|
||||
C++ or Java for low to zero overhead reading & writing.
|
||||
Optionally, JSON data can be dynamically parsed into buffers.
|
||||
|
||||
### Tables
|
||||
|
||||
Tables are the cornerstone of FlatBuffers, since format evolution is
|
||||
essential for most applications of serialization. Typically, dealing
|
||||
with format changes is something that can be done transparently during
|
||||
the parsing process of most serialization solutions out there.
|
||||
But a FlatBuffer isn't parsed before it is accessed.
|
||||
|
||||
Tables get around this by using an extra indirection to access fields,
|
||||
through a *vtable*. Each table comes with a vtable (which may be shared
|
||||
between multiple tables with the same layout), and contains information
|
||||
where fields for this particular kind of instance of vtable are stored.
|
||||
The vtable may also indicate that the field is not present (because this
|
||||
FlatBuffer was written with an older version of the software, of simply
|
||||
because the information was not necessary for this instance, or deemed
|
||||
deprecated), in which case a default value is returned.
|
||||
|
||||
Tables have a low overhead in memory (since vtables are small and
|
||||
shared) and in access cost (an extra indirection), but provide great
|
||||
flexibility. Tables may even cost less memory than the equivalent
|
||||
struct, since fields do not need to be stored when they are equal to
|
||||
their default.
|
||||
|
||||
FlatBuffers additionally offers "naked" structs, which do not offer
|
||||
forwards/backwards compatibility, but can be even smaller (useful for
|
||||
very small objects that are unlikely to change, like e.g. a coordinate
|
||||
pair or a RGBA color).
|
||||
|
||||
### Schemas
|
||||
|
||||
While schemas reduce some generality (you can't just read any data
|
||||
without having its schema), they have a lot of upsides:
|
||||
|
||||
- Most information about the format can be factored into the generated
|
||||
code, reducing memory needed to store data, and time to access it.
|
||||
|
||||
- The strong typing of the data definitions means less error
|
||||
checking/handling at runtime (less can go wrong).
|
||||
|
||||
- A schema enables us to access a buffer without parsing.
|
||||
|
||||
FlatBuffer schemas are fairly similar to those of the incumbent,
|
||||
Protocol Buffers, and generally should be readable to those familiar
|
||||
with the C family of languages. We chose to improve upon the features
|
||||
offered by .proto files in the following ways:
|
||||
|
||||
- Deprecation of fields instead of manual field id assignment.
|
||||
Extending an object in a .proto means hunting for a free slot among
|
||||
the numbers (preferring lower numbers since they have a more compact
|
||||
representation). Besides being inconvenient, it also makes removing
|
||||
fields problematic: you either have to keep them, not making it
|
||||
obvious that this field shouldn't be read/written anymore, and still
|
||||
generating accessors. Or you remove it, but now you risk that
|
||||
there's still old data around that uses that field by the time
|
||||
someone reuses that field id, with nasty consequences.
|
||||
|
||||
- Differentiating between tables and structs (see above). Effectively
|
||||
all table fields are `optional`, and all struct fields are
|
||||
`required`.
|
||||
|
||||
- Having a native vector type instead of `repeated`. This gives you a
|
||||
length without having to collect all items, and in the case of
|
||||
scalars provides for a more compact representation, and one that
|
||||
guarantees adjacency.
|
||||
|
||||
- Having a native `union` type instead of using a series of `optional`
|
||||
fields, all of which must be checked individually.
|
||||
|
||||
- Being able to define defaults for all scalars, instead of having to
|
||||
deal with their optionality at each access.
|
||||
|
||||
- A parser that can deal with both schemas and data definitions (JSON
|
||||
compatible) uniformly.
|
||||
|
||||
|
||||
2359
docs/source/doxyfile
Executable file
2359
docs/source/doxyfile
Executable file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user