Initial commit of the FlatBuffers code.

Change-Id: I4c9f0f722490b374257adb3fec63e44ae93da920
Tested: using VS2010 / Xcode / gcc on Linux.
This commit is contained in:
Wouter van Oortmerssen
2014-01-27 16:52:49 -08:00
parent c1b43e22b0
commit 26a30738a4
102 changed files with 12647 additions and 0 deletions

49
docs/source/Benchmarks.md Executable file
View File

@@ -0,0 +1,49 @@
# Benchmarks
Comparing against other serialization solutions, running on Windows 7
64bit. We use the LITE runtime for Protocol Buffers (less code / lower
overhead), and Rapid JSON, one of the fastest C++ JSON parsers around.
We compare against Flatbuffers with the binary wire format (as
intended), and also with JSON as the wire format with the optional JSON
parser (which, using a schema, parses JSON into a binary buffer that can
then be accessed as before).
The benchmark object is a set of about 10 objects containing an array, 4
strings, and a large variety of int/float scalar values of all sizes,
meant to be representative of game data, e.g. a scene format.
| | FlatBuffers (binary) | Protocol Buffers LITE | Rapid JSON | FlatBuffers (JSON) |
|--------------------------------------------------------|-----------------------|-----------------------|-----------------------|-----------------------|
| Decode + Traverse + Dealloc (1 million times, seconds) | 0.08 | 305 | 583 | 105 |
| Decode / Traverse / Dealloc (breakdown) | 0 / 0.08 / 0 | 220 / 3.6 / 81 | 294 / 0.9 / 287 | 70 / 0.08 / 35 |
| Encode (1 million times, seconds) | 3.2 | 185 | 650 | 169 |
| Wire format size (normal / zlib, bytes) | 344 / 220 | 228 / 174 | 1475 / 322 | 1029 / 298 |
| Memory needed to store decoded wire (bytes / blocks) | 0 / 0 | 760 / 20 | 65689 / 40 | 328 / 1 |
| Transient memory allocated during decode (KB) | 0 | 1 | 131 | 4 |
| Generated source code size (KB) | 4 | 61 | 0 | 4 |
| Field access in handwritten traversal code | accessors | accessors | manual error checking | accessors |
| Library source code (KB) | 15 | some subset of 3800 | 87 | 43 |
### Some other serialization systems we compared against but did not benchmark (yet), in rough order of applicability:
- Cap'n'Proto promises to reduce Protocol Buffers much like FlatBuffers does,
though with a more complicated binary encoding and less flexibility (no
optional fields to allow deprecating fields or serializing with missing
fields for which defaults exist).
It currently also isn't fully cross-platform portable (lack of VS support).
- msgpack: has very minimal forwards/backwards compatability support when used
with the typed C++ interface. Also lacks VS2010 support.
- Thrift: very similar to Protocol Buffers, but appears to be less efficient,
and have more dependencies.
- XML: typically even slower than JSON, but has the advantage that it can be
parsed with a schema to reduce error-checking boilerplate code.
- YAML: a superset of JSON and otherwise very similar. Used by e.g. Unity.
- C# comes with built-in serialization functionality, as used by Unity also.
Being tied to the language, and having no automatic versioning support
limits its applicability.
- Project Anarchy (the free mobile engine by Havok) comes with a serialization
system, that however does no automatic versioning (have to code around new
fields manually), is very much tied to the rest of the engine, and works
without a schema to generate code (tied to your C++ class definition).

45
docs/source/Building.md Executable file
View File

@@ -0,0 +1,45 @@
# Building
The system comes with a `cmake` file that should allow you to build the
compiler `flatc` and the tests (optionally). For details on `cmake`, see
<http://www.cmake.org>. In brief, depending on your platform, use one of
e.g.:
cmake -G "Unix Makefiles"
cmake -G "Visual Studio 10"
cmake -G "Xcode"
Then, build as normal for your platform. This should result in a `flatc`
executable, essential for the next steps.
Note that to use clang instead of gcc, you may need to set up your environment
variables, e.g.
`CC=/usr/bin/clang CXX=/usr/bin/clang++ cmake -G "Unix Makefiles"`.
Optionally, run the `flattests` executable.
to ensure everything is working correctly on your system. If this fails,
please contact us!
The cmake file will also build two sample executables, `sample_binary` and
`sample_text`, see the corresponding `.cpp` file in the samples directory.
There is an `android` directory that contains all you need to build the test
executable on android (use the included `build_apk.sh` script, or use
`ndk_build` / `adb` etc. as usual). Upon running, it will output to the log
if tests succeeded or not.
There is usually no runtime to compile, as the code consists of a single
header, `include/flatbuffers/flatbuffers.h`. You should add the
`include` folder to your include paths. If you wish to be
able to load schemas and/or parse text into binary buffers at runtime,
you additionally need the other headers in `include/flatbuffers`. You must
also compile/link `src/idl_parser.cpp` (and `src/idl_gen_text.cpp` if you
also want to be able convert binary to text).
For applications on Google Play that integrate this library, usage is tracked.
This tracking is done automatically using the embedded version string
(flatbuffer_version_string), and helps us continue to optimize it.
Aside from consuming a few extra bytes in your application binary, it shouldn't
affect your application at all. We use this information to let us know if
FlatBuffers is useful and if we should continue to invest in it. Since this is
open source, you are free to remove the version string but we would appreciate
if you would leave it in.

22
docs/source/Compiler.md Executable file
View File

@@ -0,0 +1,22 @@
# Using the schema compiler
Usage:
flatc [ -c ] [ -j ] [ -b ] [ -t ] file1 file2 ..
The files are read and parsed in order, and can contain either schemas
or data (see below). Later files can make use of definitions in earlier
files. Depending on the flags passed, additional files may
be generated for each file processed:
- `-c` : Generate a C++ header for all definitions in this file (as
`filename_generated.h`). Skips data.
- `-j` : Generate Java classes.
- `-b` : If data is contained in this file, generate a
`filename_wire.bin` containing the binary flatbuffer.
- `-t` : If data is contained in this file, generate a
`filename_wire.txt` (for debugging).

226
docs/source/CppUsage.md Executable file
View File

@@ -0,0 +1,226 @@
# Use in C++
Assuming you have written a schema using the above language in say
`mygame.fbs` (FlatBuffer Schema, though the extension doesn't matter),
you've generated a C++ header called `mygame_generated.h` using the
compiler (e.g. `flatc -c mygame.fbs`), you can now start using this in
your program by including the header. As noted, this header relies on
`flatbuffers/flatbuffers.h`, which should be in your include path.
### Writing in C++
To start creating a buffer, create an instance of `FlatBufferBuilder`
which will contain the buffer as it grows:
FlatBufferBuilder fbb;
Before we serialize a Monster, we need to first serialize any objects
that are contained there-in, i.e. we serialize the data tree using
depth first, pre-order traversal. This is generally easy to do on
any tree structures. For example:
auto name = fbb.CreateString("MyMonster");
unsigned char inv[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
auto inventory = fbb.CreateVector(inv, 10);
`CreateString` and `CreateVector` serialize these two built-in
datatypes, and return offsets into the serialized data indicating where
they are stored, such that `Monster` below can refer to them.
`CreateString` can also take an `std::string`, or a `const char *` with
an explicit length, and is suitable for holding UTF-8 and binary
data if needed.
`CreateVector` can also take an `std::vector`. The
offset it returns is typed, i.e. can only be used to set fields of the
correct type below. To create a vector of struct objects (which will
be stored as contiguous memory in the buffer, use `CreateVectorOfStructs`
instead.
Vec3 vec(1, 2, 3);
`Vec3` is the first example of code from our generated
header. Structs (unlike tables) translate to simple structs in C++, so
we can construct them in a familiar way.
We have now serialized the non-scalar components of of the monster
example, so we could create the monster something like this:
auto mloc = CreateMonster(fbb, &vec, 150, 80, name, inventory, Color_Red, Offset<void>(0), Any_NONE);
Note that we're passing `150` for the `mana` field, which happens to be the
default value: this means the field will not actually be written to the buffer,
since we'll get that value anyway when we query it. This is a nice space
savings, since it is very common for fields to be at their default. It means
we also don't need to be scared to add fields only used in a minority of cases,
since they won't bloat up the buffer sizes if they're not actually used.
We do something similarly for the union field `test` by specifying a `0` offset
and the `NONE` enum value (part of every union) to indicate we don't actually
want to write this field.
Tables (like `Monster`) give you full flexibility on what fields you write
(unlike `Vec3`, which always has all fields set because it is a `struct`).
If you want even more control over this (i.e. skip fields even when they are
not default), instead of the convenient `CreateMonster` call we can also
build the object field-by-field manually:
MonsterBuilder mb(fbb);
mb.add_pos(&vec);
mb.add_hp(80);
mb.add_name(name);
mb.add_inventory(inventory);
auto mloc = mb.Finish();
We start with a temporary helper class `MonsterBuilder` (which is
defined in our generated code also), then call the various `add_`
methods to set fields, and `Finish` to complete the object. This is
pretty much the same code as you find inside `CreateMonster`, except
we're leaving out a few fields. Fields may also be added in any order,
though orderings with fields of the same size adjacent
to each other most efficient in size, due to alignment. You should
not nest these Builder classes (serialize your
data in pre-order).
Regardless of whether you used `CreateMonster` or `MonsterBuilder`, you
now have an offset to the root of your data, and you can finish the
buffer using:
fbb.Finish(mloc);
The buffer is now ready to be stored somewhere, sent over the network,
be compressed, or whatever you'd like to do with it. You can access the
start of the buffer with `fbb.GetBufferPointer()`, and it's size from
`fbb.GetSize()`.
`samples/sample_binary.cpp` is a complete code sample similar to
the code above, that also includes the reading code below.
### Reading in C++
If you've received a buffer from somewhere (disk, network, etc.) you can
directly start traversing it using:
auto monster = GetMonster(buffer_pointer);
`monster` is of type `Monster *`, and points to somewhere inside your
buffer. If you look in your generated header, you'll see it has
convenient accessors for all fields, e.g.
assert(monster->hp() == 80);
assert(monster->mana() == 150); // default
assert(strcmp(monster->name()->c_str(), "MyMonster") == 0);
These should all be true. Note that we never stored a `mana` value, so
it will return the default.
To access sub-objects, in this case the `Vec3`:
auto pos = monster->pos();
assert(pos);
assert(pos->z() == 3);
If we had not set the `pos` field during serialization, it would be
`NULL`.
Similarly, we can access elements of the inventory array:
auto inv = monster->inventory();
assert(inv);
assert(inv->Get(9) == 9);
### Direct memory access
As you can see from the above examples, all elements in a buffer are
accessed through generated accessors. This is because everything is
stored in little endian format on all platforms (the accessor
performs a swap operation on big endian machines), and also because
the layout of things is generally not known to the user.
For structs, layout is deterministic and guaranteed to be the same
accross platforms (scalars are aligned to their
own size, and structs themselves to their largest member), and you
are allowed to access this memory directly by using `sizeof()` and
`memcpy` on the pointer to a struct, or even an array of structs.
To compute offsets to sub-elements of a struct, make sure they
are a structs themselves, as then you can use the pointers to
figure out the offset without having to hardcode it. This is
handy for use of arrays of structs with calls like `glVertexAttribPointer`
in OpenGL or similar APIs.
It is important to note is that structs are still little endian on all
machines, so only use tricks like this if you can guarantee you're not
shipping on a big endian machine (an `assert(FLATBUFFERS_LITTLEENDIAN)`
would be wise).
## Text & schema parsing
Using binary buffers with the generated header provides a super low
overhead use of FlatBuffer data. There are, however, times when you want
to use text formats, for example because it interacts better with source
control, or you want to give your users easy access to data.
Another reason might be that you already have a lot of data in JSON
format, or a tool that generates JSON, and if you can write a schema for
it, this will provide you an easy way to use that data directly.
There are two ways to use text formats:
### Using the compiler as a conversion tool
This is the preferred path, as it doesn't require you to add any new
code to your program, and is maximally efficient since you can ship with
binary data. The disadvantage is that it is an extra step for your
users/developers to perform, though you might be able to automate it.
flatc -b myschema.fbs mydata.json
This will generate the binary file `mydata_wire.bin` which can be loaded
as before.
### Making your program capable of loading text directly
This gives you maximum flexibility. You could even opt to support both,
i.e. check for both files, and regenerate the binary from text when
required, otherwise just load the binary.
This option is currently only available for C++, or Java through JNI.
As mentioned in the section "Building" above, this technique requires
you to link a few more files into your program, and you'll want to include
`flatbuffers/idl.h`.
Load text (either a schema or json) into an in-memory buffer (there is a
convenient `LoadFile()` utility function in `flatbuffers/util.h` if you
wish). Construct a parser:
flatbuffers::Parser parser;
Now you can parse any number of text files in sequence:
parser.Parse(text_file.c_str());
This works similarly to how the command-line compiler works: a sequence
of files parsed by the same `Parser` object allow later files to
reference definitions in earlier files. Typically this means you first
load a schema file (which populates `Parser` with definitions), followed
by one or more JSON files.
If there were any parsing errors, `Parse` will return `false`, and
`Parser::err` contains a human readable error string with a line number
etc, which you should present to the creator of that file.
After each JSON file, the `Parser::fbb` member variable is the
`FlatBufferBuilder` that contains the binary buffer version of that
file, that you can access as described above.
`samples/sample_text.cpp` is a code sample showing the above operations.
### Threading
None of the code is thread-safe, by design. That said, since currently a
FlatBuffer is read-only and entirely `const`, reading by multiple threads
is possible.

126
docs/source/FlatBuffers.md Normal file
View File

@@ -0,0 +1,126 @@
# FlatBuffers
FlatBuffers is an efficient cross platform serialization library in for C++ and
Java. It was created at Google specifically for game development and other
performance-critical applications.
It is available as open source under the Apache license, v2 (see LICENSE.txt).
## Why use FlatBuffers?
- **Access to serialized data without parsing/unpacking** - What sets
FlatBuffers apart is that it represents hierarchical data in a flat
binary buffer in such a way that it can still be accessed directly
without parsing/unpacking, while also still supporting data
structure evolution (forwards/backwards compatibility).
- **Memory efficiency and speed** - The only memory needed to access
your data is that of the buffer. It requires 0 additional allocations.
FlatBuffers is also very
suitable for use with mmap (or streaming), requiring only part of the
buffer to be in memory. Access is close to the speed of raw
struct access with only one extra indirection (a kind of vtable) to
allow for format evolution and optional fields. It is aimed at
projects where spending time and space (many memory allocations) to
be able to access or construct serialized data is undesirable, such
as in games or any other performance sensitive applications. See the
[benchmarks](md__benchmarks.html) for details.
- **Flexible** - Optional fields means not only do you get great
forwards and backwards compatibility (increasingly important for
long-lived games: don't have to update all data with each new
version!). It also means you have a lot of choice in what data you
write and what data you don't, and how you design data structures.
- **Tiny code footprint** - Small amounts of generated code, and just
a single small header as the minimum dependency, which is very easy
to integrate. Again, see the benchmark section for details.
- **Strongly typed** - Errors happen at compile time rather than
manually having to write repetitive and error prone run-time checks.
Useful code can be generated for you.
- **Convenient to use** - Generated C++ code allows for terse access
& construction code. Then there's optional functionality for parsing
schemas and JSON-like text representations at runtime efficiently if
needed (faster and more memory efficient than other JSON
parsers).
Java code supports object-reuse.
- **Cross platform C++11/Java code with no dependencies** - will work with
any recent gcc/clang and VS2010. Comes with build files for the tests &
samples (Android .mk files, and cmake for all other platforms).
### Why not use Protocol Buffers, or .. ?
Protocol Buffers is indeed relatively similar to FlatBuffers,
with the primary difference being that FlatBuffers does not need a parsing/
unpacking step to a secondary representation before you can
access data, often coupled with per-object memory allocation. The code
is an order of magnitude bigger, too. Protocol Buffers has neither optional
text import/export nor schema language features like unions.
### But all the cool kids use JSON!
JSON is very readable (which is why we use it as our optional text
format) and very convenient when used together with dynamically typed
languages (such as JavaScript). When serializing data from statically
typed languages, however, JSON not only has the obvious drawback of runtime
inefficiency, but also forces you to write *more* code to access data
(counterintuitively) due to its dynamic-typing serialization system.
In this context, it is only a better choice for systems that have very
little to no information ahead of time about what data needs to be stored.
Read more about the "why" of FlatBuffers in the
[white paper](md__white_paper.html).
## Usage in brief
This section is a quick rundown of how to use this system. Subsequent
sections provide a more in-depth usage guide.
- Write a schema file that allows you to define the data structures
you may want to serialize. Fields can have a scalar type
(ints/floats of all sizes), or they can be a: string; array of any type;
reference to yet another object; or, a set of possible objects (unions).
Fields are optional and have defaults, so they don't need to be
present for every object instance.
- Use `flatc` (the FlatBuffer compiler) to generate a C++ header (or Java
classes) with helper classes to access and construct serialized data. This
header (say `mydata_generated.h`) only depends on `flatbuffers.h`, which
defines the core functionality.
- Use the `FlatBufferBuilder` class to construct a flat binary buffer.
The generated functions allow you to add objects to this
buffer recursively, often as simply as making a single function call.
- Store or send your buffer somewhere!
- When reading it back, you can obtain the pointer to the root object
from the binary buffer, and from there traverse it conveniently
in-place with `object->field()`.
## In-depth documentation
- How to [build the compiler](md__building.html) and samples on various
platforms.
- How to [use the compiler](md__compiler.html).
- How to [write a schema](md__schemas.html).
- How to [use the generated C++ code](md__cpp_usage.html) in your own
programs.
- How to [use the generated Java code](md__java_usage.html) in your own
programs.
- Some [benchmarks](md__benchmarks.html) showing the advantage of using
FlatBuffers.
- A [white paper](md__white_paper.html) explaining the "why" of FlatBuffers.
- A description of the [internals](md__internals.html) of FlatBuffers.
- A formal [grammar](md__grammar.html) of the schema language.
## Online resources
- [github repository](http://github.com/google/flatbuffers)
- [landing page](http://google.github.io/flatbuffers)
- [FlatBuffers Google Group](http://group.google.com/group/flatbuffers)
- [FlatBuffers Issues Tracker](http://github.com/google/flatbuffers/issues)

30
docs/source/Grammar.md Executable file
View File

@@ -0,0 +1,30 @@
# Formal Grammar of the schema language
schema = namespace\_decl | type\_decl | enum\_decl | root\_decl | object
namespace\_decl = `namespace` ident ( `.` ident )* `;`
type\_decl = ( `table` | `struct` ) ident metadata `{` field\_decl+ `}`
enum\_decl = ( `enum` | `union` ) ident [ `:` type ] metadata `{` commasep(
enumval\_decl ) `}`
root\_decl = `root_type` ident `;`
field\_decl = type `:` ident [ `=` scalar ] metadata `;`
type = `bool` | `byte` | `ubyte` | `short` | `ushort` | `int` | `uint` |
`float` | `long` | `ulong` | `double`
| `string` | `[` type `]` | ident
enumval\_decl = ident [ `=` integer\_constant ]
metadata = [ `(` commasep( ident [ `:` scalar ] ) `)` ]
scalar = integer\_constant | float\_constant | `true` | `false`
object = { commasep( ident `:` value ) }
value = scalar | object | string\_constant | `[` commasep( value ) `]`
commasep(x) = [ x ( `,` x )\* ]

244
docs/source/Internals.md Executable file
View File

@@ -0,0 +1,244 @@
# FlatBuffer Internals
This section is entirely optional for the use of FlatBuffers. In normal
usage, you should never need the information contained herein. If you're
interested however, it should give you more of an appreciation of why
FlatBuffers is both efficient and convenient.
### Format components
A FlatBuffer is a binary file and in-memory format consisting mostly of
scalars of various sizes, all aligned to their own size. Each scalar is
also always represented in little-endian format, as this corresponds to
all commonly used CPUs today. FlatBuffers will also work on big-endian
machines, but will be slightly slower because of additional
byte-swap intrinsics.
On purpose, the format leaves a lot of details about where exactly
things live in memory undefined, e.g. fields in a table can have any
order, and objects to some extend can be stored in many orders. This is
because the format doesn't need this information to be efficient, and it
leaves room for optimization and extension (for example, fields can be
packed in a way that is most compact). Instead, the format is defined in
terms of offsets and adjacency only.
### Format identification
The format also doesn't contain information for format identification
and versioning, which is also by design. FlatBuffers is a statically typed
system, meaning the user of a buffer needs to know what kind of buffer
it is. FlatBuffers can of course be wrapped inside other containers
where needed, or you can use its union feature to dynamically identify
multiple possible sub-objects stored. Additionally, it can be used
together with the schema parser if full reflective capabilities are
desired.
Versioning is something that is intrinsically part of the format (the
optionality / extensibility of fields), so the format itself does not
need a version number (it's a meta-format, in a sense). We're hoping
that this format can accommodate all data needed. If format breaking
changes are ever necessary, it would become a new kind of format rather
than just a variation.
### Offsets
The most important and generic offset type (see `flatbuffers.h`) is
`offset_t`, which is currently always a `uint32_t`, and is used to
refer to all tables/unions/strings/vectors. 32bit is
intentional, since we want to keep the format binary compatible between
32 and 64bit systems, and a 64bit offset would bloat the size for almost
all uses. A version of this format with 64bit (or 16bit) offsets is easy to set
when needed. Unsigned means they can only point in one direction, which
typically is forward (towards a higher memory location). Any backwards
offsets will be explicitly marked as such.
The format starts with an `offset_t` to the root object in the buffer.
We have two kinds of objects, structs and tables.
### Structs
These are the simplest, and as mentioned, intended for simple data that
benefits from being extra efficient and doesn't need versioning /
extensibility. They are always stored inline in their parent (a struct,
table, or vector) for maximum compactness. Structs define a consistent
memory layout where all components are aligned to their size, and
structs aligned to their largest scalar member. This is done independent
of the alignment rules of the underlying compiler to guarantee a cross
platform compatible layout. This layout is then enforced in the generated
code.
### Tables
These start with an `soffset_t` to a vtable (signed version of
`offset_t`, since vtables may be stored anywhere), followed by all the
fields as aligned scalars. Unlike structs, not all fields need to be
present. There is no set order and layout.
To be able to access fields regardless of these uncertainties, we go
through a vtable of offsets. Vtables are shared between any objects that
happen to have the same vtable values.
The elements of a vtable are all of type `voffset_t`, which is currently
a `uint16_t`. The first element is the number of elements of the vtable,
including this one. The second one is the size of the object, in bytes
(including the vtable offset). This size is used for streaming, to know
how many bytes to read to be able to access all fields of the object.
The remaining elements are N the offsets, where N is the amount of field
declared in the schema when the code that constructed this buffer was
compiled (thus, the size of the table is N + 2).
All accessor functions in the generated code for tables contain the
offset into this table as a constant. This offset is checked against the
first field (the number of elements), to protect against newer code
reading older data. If this offset is out of range, or the vtable entry
is 0, that means the field is not present in this object, and the
default value is return. Otherwise, the entry is used as offset to the
field to be read.
### Strings and Vectors
Strings are simply a vector of bytes, and are always
null-terminated. Vectors are stored as contiguous aligned scalar
elements prefixed by a count.
### Construction
The current implementation constructs these buffers backwards, since
that significantly reduces the amount of bookkeeping and simplifies the
construction API.
### Code example
Here's an example of the code that gets generated for the `samples/monster.fbs`.
What follows is the entire file, broken up by comments:
// automatically generated, do not modify
#include "flatbuffers/flatbuffers.h"
namespace MyGame {
namespace Sample {
Nested namespace support.
enum {
Color_Red = 0,
Color_Green = 1,
Color_Blue = 2,
};
inline const char **EnumNamesColor() {
static const char *names[] = { "Red", "Green", "Blue", nullptr };
return names;
}
inline const char *EnumNameColor(int e) { return EnumNamesColor()[e]; }
Enums and convenient reverse lookup.
enum {
Any_NONE = 0,
Any_Monster = 1,
};
inline const char **EnumNamesAny() {
static const char *names[] = { "NONE", "Monster", nullptr };
return names;
}
inline const char *EnumNameAny(int e) { return EnumNamesAny()[e]; }
Unions share a lot with enums.
struct Vec3;
struct Monster;
Predeclare all datatypes since there may be circular references.
MANUALLY_ALIGNED_STRUCT(4) Vec3 {
private:
float x_;
float y_;
float z_;
public:
Vec3(float x, float y, float z)
: x_(flatbuffers::EndianScalar(x)), y_(flatbuffers::EndianScalar(y)), z_(flatbuffers::EndianScalar(z)) {}
float x() const { return flatbuffers::EndianScalar(x_); }
float y() const { return flatbuffers::EndianScalar(y_); }
float z() const { return flatbuffers::EndianScalar(z_); }
};
STRUCT_END(Vec3, 12);
These ugly macros do a couple of things: they turn off any padding the compiler
might normally do, since we add padding manually (though none in this example),
and they enforce alignment chosen by FlatBuffers. This ensures the layout of
this struct will look the same regardless of compiler and platform. Note that
the fields are private: this is because these store little endian scalars
regardless of platform (since this is part of the serialized data).
`EndianScalar` then converts back and forth, which is a no-op on all current
mobile and desktop platforms, and a single machine instruction on the few
remaining big endian platforms.
struct Monster : private flatbuffers::Table {
const Vec3 *pos() const { return GetStruct<const Vec3 *>(4); }
int16_t mana() const { return GetField<int16_t>(6, 150); }
int16_t hp() const { return GetField<int16_t>(8, 100); }
const flatbuffers::String *name() const { return GetPointer<const flatbuffers::String *>(10); }
const flatbuffers::Vector<uint8_t> *inventory() const { return GetPointer<const flatbuffers::Vector<uint8_t> *>(14); }
int8_t color() const { return GetField<int8_t>(16, 2); }
};
Tables are a bit more complicated. A table accessor struct is used to point at
the serialized data for a table, which always starts with an offset to its
vtable. It derives from `Table`, which contains the `GetField` helper functions.
GetField takes a vtable offset, and a default value. It will look in the vtable
at that offset. If the offset is out of bounds (data from an older version) or
the vtable entry is 0, the field is not present and the default is returned.
Otherwise, it uses the entry as an offset into the table to locate the field.
struct MonsterBuilder {
flatbuffers::FlatBufferBuilder &fbb_;
flatbuffers::uoffset_t start_;
void add_pos(const Vec3 *pos) { fbb_.AddStruct(4, pos); }
void add_mana(int16_t mana) { fbb_.AddElement<int16_t>(6, mana, 150); }
void add_hp(int16_t hp) { fbb_.AddElement<int16_t>(8, hp, 100); }
void add_name(flatbuffers::Offset<flatbuffers::String> name) { fbb_.AddOffset(10, name); }
void add_inventory(flatbuffers::Offset<flatbuffers::Vector<uint8_t>> inventory) { fbb_.AddOffset(14, inventory); }
void add_color(int8_t color) { fbb_.AddElement<int8_t>(16, color, 2); }
MonsterBuilder(flatbuffers::FlatBufferBuilder &_fbb) : fbb_(_fbb) { start_ = fbb_.StartTable(); }
flatbuffers::Offset<Monster> Finish() { return flatbuffers::Offset<Monster>(fbb_.EndTable(start_, 7)); }
};
`MonsterBuilder` is the base helper struct to construct a table using a
`FlatBufferBuilder`. You can add the fields in any order, and the `Finish`
call will ensure the correct vtable gets generated.
inline flatbuffers::Offset<Monster> CreateMonster(flatbuffers::FlatBufferBuilder &_fbb, const Vec3 *pos, int16_t mana, int16_t hp, flatbuffers::Offset<flatbuffers::String> name, flatbuffers::Offset<flatbuffers::Vector<uint8_t>> inventory, int8_t color) {
MonsterBuilder builder_(_fbb);
builder_.add_inventory(inventory);
builder_.add_name(name);
builder_.add_pos(pos);
builder_.add_hp(hp);
builder_.add_mana(mana);
builder_.add_color(color);
return builder_.Finish();
}
`CreateMonster` is a convenience function that calls all functions in
`MonsterBuilder` above for you. Note that if you pass values which are
defaults as arguments, it will not actually construct that field, so
you can probably use this function instead of the builder class in
almost all cases.
inline const Monster *GetMonster(const void *buf) { return flatbuffers::GetRoot<Monster>(buf); }
This function is only generated for the root table type, to be able to
start traversing a FlatBuffer from a raw buffer pointer.
}; // namespace MyGame
}; // namespace Sample

79
docs/source/JavaUsage.md Executable file
View File

@@ -0,0 +1,79 @@
# Use in Java
There's experimental support for reading FlatBuffers in Java. Generate code
for Java with the `-j` option to `flatc`.
See `javaTest.java` for an example. Essentially, you read a FlatBuffer binary
file into a `byte[]`, which you then turn into a `ByteBuffer`, which you pass to
the `getRootAsMonster` function:
ByteBuffer bb = ByteBuffer.wrap(data);
Monster monster = Monster.getRootAsMonster(bb);
Now you can access values much like C++:
short hp = monster.hp();
Vec3 pos = monster.pos();
Note that whenever you access a new object like in the `pos` example above,
a new temporary accessor object gets created. If your code is very performance
sensitive (you iterate through a lot of objects), there's a second `pos()`
method to which you can pass a `Vec3` object you've already created. This allows
you to reuse it across many calls and reduce the amount of object allocation (and
thus garbage collection) your program does.
Sadly the string accessors currently always create a new string when accessed,
since FlatBuffer's UTF-8 strings can't be read in-place by Java.
Vector access is also a bit different from C++: you pass an extra index
to the vector field accessor. Then a second method with the same name
suffixed by `_length` let's you know the number of elements you can access:
for (int i = 0; i < monster.inventory_length(); i++)
monster.inventory(i); // do something here
You can also construct these buffers in Java using the static methods found
in the generated code, and the FlatBufferBuilder class:
FlatBufferBuilder fbb = new FlatBufferBuilder();
Create strings:
int str = fbb.createString("MyMonster");
Create a table with a struct contained therein:
Monster.startMonster(fbb);
Monster.addPos(fbb, Vec3.createVec3(fbb, 1.0f, 2.0f, 3.0f, 3.0, (byte)4, (short)5, (byte)6));
Monster.addHp(fbb, (short)80);
Monster.addName(fbb, str);
Monster.addInventory(fbb, inv);
Monster.addTest_type(fbb, (byte)1);
Monster.addTest(fbb, mon2);
Monster.addTest4(fbb, test4s);
int mon = Monster.endMonster(fbb);
As you can see, the Java code for tables does not use a convenient
`createMonster` call like the C++ code. This is to create the buffer without
using temporary object allocation (since the `Vec3` is an inline component of
`Monster`, it has to be created right where it is added, whereas the name and
the inventory are not inline).
Structs do have convenient methods that even have arguments for nested structs.
Vectors also use this start/end pattern to allow vectors of both scalar types
and structs:
Monster.startInventoryVector(fbb, 5);
for (byte i = 4; i >=0; i--) fbb.addByte(i);
int inv = fbb.endVector();
You can use the generated method `startInventoryVector` to conveniently call
`startVector` with the right element size. You pass the number of
elements you want to write. You write the elements backwards since the buffer
is being constructed back to front.
## Text Parsing
There currently is no support for parsing text (Schema's and JSON) directly
from Java, though you could use the C++ parser through JNI. Please see the
C++ documentation for more on text parsing.

198
docs/source/Schemas.md Executable file
View File

@@ -0,0 +1,198 @@
# Writing a schema
The syntax of the schema language (aka IDL, Interface Definition
Language) should look quite familiar to users of any of the C family of
languages, and also to users of other IDLs. Let's look at an example
first:
// example IDL file
namespace MyGame;
enum Color : byte { Red = 1, Green, Blue }
union Any { Monster, Weapon, Pickup }
struct Vec3 {
x:float;
y:float;
z:float;
}
table Monster {
pos:Vec3;
mana:short = 150;
hp:short = 100;
name:string;
friendly:bool = false (deprecated, priority: 1);
inventory:[ubyte];
color:Color = Blue;
test:Any;
}
root_type Monster;
(Weapon & Pickup not defined as part of this example).
### Tables
Tables are the main way of defining objects in FlatBuffers, and consist
of a name (here `Monster`) and a list of fields. Each field has a name,
a type, and optionally a default value (if omitted, it defaults to 0 /
NULL).
Each field is optional: It does not have to appear in the wire
representation, and you can choose to omit fields for each individual
object. As a result, you have the flexibility to add fields without fear of
bloating your data. This design is also FlatBuffer's mechanism for forward
and backwards compatibility. Note that:
- You can add new fields in the schema ONLY at the end of a table
definition. Older data will still
read correctly, and give you the default value when read. Older code
will simply ignore the new field.
- You cannot delete fields you don't use anymore from the schema,
but you can simply
stop writing them into your data for almost the same effect.
Additionally you can mark them as `deprecated` as in the example
above, which will prevent the generation of accessors in the
generated C++, as a way to enforce the field not being used any more.
(careful: this may break code!).
- You may change field names and table names, if you're ok with your
code breaking until you've renamed them there too.
### Structs
Similar to a table, only now none of the fields are optional (so no defaults
either), and fields may not be added or be deprecated. Structs may only contain
scalars or other structs. Use this for
simple objects where you are very sure no changes will ever be made
(as quite clear in the example `Vec3`). Structs use less memory than
tables and are even faster to access (they are always stored in-line in their
parent object, and use no virtual table).
### Types
Builtin scalar types are:
- 8 bit: `byte ubyte bool`
- 16 bit: `short ushort`
- 32 bit: `int uint float`
- 64 bit: `long ulong double`
- Vector of any other type (denoted with `[type]`). Nesting vectors
require you wrap the inner vector in a struct/table rather than
writing `[[type]]`.
- `string`, which may only hold UTF-8 or 7-bit ASCII. For other text encodings
or general binary data use vectors (`[byte]` or `[ubyte]`) instead.
- References to other tables or structs, enums or unions (see
below).
You can't change types of fields once they're used, with the exception
of same-size data where a `reinterpret_cast` would give you a desirable result,
e.g. you could change a `uint` to an `int` if no values in current data use the
high bit yet.
### (Default) Values
Values are a sequence of digits, optionally followed by a `.` and more digits
for float constants, and optionally prefixed by a `-`. Non-scalar defaults are
currently not supported (always NULL).
You generally do not want to change default values after they're initially
defined. Fields that have the default value are not actually stored in the
serialized data but are generated in code, so when you change the default, you'd
now get a different value than from code generated from an older version of
the schema. There are situations however where this may be
desirable, especially if you can ensure a simultaneous rebuild of
all code.
### Enums
Define a sequence of named constants, each with a given value, or
increasing by one from the previous one. The default first value
is `0`. As you can see in the enum declaration, you specify the underlying
integral type of the enum with `:` (in this case `byte`), which then determines
the type of any fields declared with this enum type. If you omit the underlying
type, it will be `short`.
### Unions
Unions share a lot of properties with enums, but instead of new names
for constants, you use names of tables. You can then declare
a union field which can hold a reference to any of those types, and
additionally a hidden field with the suffix `_type` is generated that
holds the corresponding enum value, allowing you to know which type to
cast to at runtime.
### Namespaces
These will generate the corresponding namespace in C++ for all helper
code, and packages in Java. You can use `.` to specify nested namespaces /
packages.
### Root type
This declares what you consider to be the root table (or struct) of the
serialized data.
### Comments & documentation
May be written as in most C-based languages. Additionally, a triple
comment (`///`) on a line by itself signals that a comment is documentation
for whatever is declared on the line after it
(table/struct/field/enum/union/element), and the comment is output
in the corresponding C++ code. Multiple such lines per item are allowed.
### Attributes
Attributes may be attached to a declaration, behind a field, or after
the name of a table/struct/enum/union. These may either have a value or
not. Some attributes like `deprecated` are understood by the compiler,
others are simply ignored (like `priority`), but are available to query
if you parse the schema at runtime.
This is useful if you write your own code generators/editors etc., and
you wish to add additional information specific to your tool (such as a
help text).
Current understood attributes:
- `deprecated` (on a field): do not generate accessors for this field
anymore, code should stop using this data.
- `original_order` (on a table): since elements in a table do not need
to be stored in any particular order, they are often optimized for
space by sorting them to size. This attribute stops that from happening.
- `force_align: size` (on a struct): force the alignment of this struct
to be something higher than what it is naturally aligned to. Causes
these structs to be aligned to that amount inside a buffer, IF that
buffer is allocated with that alignment (which is not necessarily
the case for buffers accessed directly inside a `FlatBufferBuilder`).
## Gotchas
### Schemas and version control
FlatBuffers relies on new field declarations being added at the end, and earlier
declarations to not be removed, but be marked deprecated when needed. We think
this is an improvement over the manual number assignment that happens in
Protocol Buffers.
One place where this is possibly problematic however is source control. If user
A adds a field, generates new binary data with this new schema, then tries to
commit both to source control after user B already committed a new field also,
and just auto-merges the schema, the binary files are now invalid compared to
the new schema.
The solution of course is that you should not be generating binary data before
your schema changes have been committed, ensuring consistency with the rest of
the world.

127
docs/source/WhitePaper.md Executable file
View File

@@ -0,0 +1,127 @@
# FlatBuffers white paper
This document tries to shed some light on to the "why" of FlatBuffers, a
new serialization library.
## Motivation
Back in the good old days, performance was all about instructions and
cycles. Nowadays, processing units have run so far ahead of the memory
subsystem, that making an efficient application should start and finish
with thinking about memory. How much you use of it. How you lay it out
and access it. How you allocate it. When you copy it.
Serialization is a pervasive activity in a lot programs, and a common
source of memory inefficiency, with lots of temporary data structures
needed to parse and represent data, and inefficient allocation patterns
and locality.
If it would be possible to do serialization with no temporary objects,
no additional allocation, no copying, and good locality, this could be
of great value. The reason serialization systems usually don't manage
this is because it goes counter to forwards/backwards compatability, and
platform specifics like endianness and alignment.
FlatBuffers is what you get if you try anyway.
In particular, FlatBuffers focus is on mobile hardware (where memory
size and memory bandwidth is even more constrained than on desktop
hardware), and applications that have the highest performance needs:
games.
## FlatBuffers
*This is a summary of FlatBuffers functionality, with some rationale.
A more detailed description can be found in the FlatBuffers
documentation.*
### Summary
A FlatBuffer is a binary buffer containing nested objects (structs,
tables, vectors,..) organized using offsets so that the data can be
traversed in-place just like any pointer-based data structure. Unlike
most in-memory data structures however, it uses strict rules of
alignment and endianness (always little) to ensure these buffers are
cross platform. Additionally, for objects that are tables, FlatBuffers
provides forwards/backwards compatibility and general optionality of
fields, to support most forms of format evolution.
You define your object types in a schema, which can then be compiled to
C++ or Java for low to zero overhead reading & writing.
Optionally, JSON data can be dynamically parsed into buffers.
### Tables
Tables are the cornerstone of FlatBuffers, since format evolution is
essential for most applications of serialization. Typically, dealing
with format changes is something that can be done transparently during
the parsing process of most serialization solutions out there.
But a FlatBuffer isn't parsed before it is accessed.
Tables get around this by using an extra indirection to access fields,
through a *vtable*. Each table comes with a vtable (which may be shared
between multiple tables with the same layout), and contains information
where fields for this particular kind of instance of vtable are stored.
The vtable may also indicate that the field is not present (because this
FlatBuffer was written with an older version of the software, of simply
because the information was not necessary for this instance, or deemed
deprecated), in which case a default value is returned.
Tables have a low overhead in memory (since vtables are small and
shared) and in access cost (an extra indirection), but provide great
flexibility. Tables may even cost less memory than the equivalent
struct, since fields do not need to be stored when they are equal to
their default.
FlatBuffers additionally offers "naked" structs, which do not offer
forwards/backwards compatibility, but can be even smaller (useful for
very small objects that are unlikely to change, like e.g. a coordinate
pair or a RGBA color).
### Schemas
While schemas reduce some generality (you can't just read any data
without having its schema), they have a lot of upsides:
- Most information about the format can be factored into the generated
code, reducing memory needed to store data, and time to access it.
- The strong typing of the data definitions means less error
checking/handling at runtime (less can go wrong).
- A schema enables us to access a buffer without parsing.
FlatBuffer schemas are fairly similar to those of the incumbent,
Protocol Buffers, and generally should be readable to those familiar
with the C family of languages. We chose to improve upon the features
offered by .proto files in the following ways:
- Deprecation of fields instead of manual field id assignment.
Extending an object in a .proto means hunting for a free slot among
the numbers (preferring lower numbers since they have a more compact
representation). Besides being inconvenient, it also makes removing
fields problematic: you either have to keep them, not making it
obvious that this field shouldn't be read/written anymore, and still
generating accessors. Or you remove it, but now you risk that
there's still old data around that uses that field by the time
someone reuses that field id, with nasty consequences.
- Differentiating between tables and structs (see above). Effectively
all table fields are `optional`, and all struct fields are
`required`.
- Having a native vector type instead of `repeated`. This gives you a
length without having to collect all items, and in the case of
scalars provides for a more compact representation, and one that
guarantees adjacency.
- Having a native `union` type instead of using a series of `optional`
fields, all of which must be checked individually.
- Being able to define defaults for all scalars, instead of having to
deal with their optionality at each access.
- A parser that can deal with both schemas and data definitions (JSON
compatible) uniformly.

2359
docs/source/doxyfile Executable file

File diff suppressed because it is too large Load Diff