Initial commit of the FlatBuffers code.

Change-Id: I4c9f0f722490b374257adb3fec63e44ae93da920 Tested: using VS2010 / Xcode / gcc on Linux.
2026-06-05 21:17:25 +00:00 · 2014-01-27 16:52:49 -08:00
parent c1b43e22b0
commit 26a30738a4
102 changed files with 12647 additions and 0 deletions
--- a/docs/source/Benchmarks.md
+++ b/docs/source/Benchmarks.md
@@ -0,0 +1,49 @@
+# Benchmarks
+
+Comparing against other serialization solutions, running on Windows 7
+64bit. We use the LITE runtime for Protocol Buffers (less code / lower
+overhead), and Rapid JSON, one of the fastest C++ JSON parsers around.
+
+We compare against Flatbuffers with the binary wire format (as
+intended), and also with JSON as the wire format with the optional JSON
+parser (which, using a schema, parses JSON into a binary buffer that can
+then be accessed as before).
+
+The benchmark object is a set of about 10 objects containing an array, 4
+strings, and a large variety of int/float scalar values of all sizes,
+meant to be representative of game data, e.g. a scene format.
+
+|                                                        | FlatBuffers (binary)  | Protocol Buffers LITE | Rapid JSON            | FlatBuffers (JSON)    |
+|--------------------------------------------------------|-----------------------|-----------------------|-----------------------|-----------------------|
+| Decode + Traverse + Dealloc (1 million times, seconds) | 0.08                  | 305                   | 583                   | 105                   |
+| Decode / Traverse / Dealloc (breakdown)                | 0 / 0.08 / 0          | 220 / 3.6 / 81        | 294 / 0.9 / 287       | 70 / 0.08 / 35        |
+| Encode (1 million times, seconds)                      | 3.2                   | 185                   | 650                   | 169                   |
+| Wire format size (normal / zlib, bytes)                | 344 / 220             | 228 / 174             | 1475 / 322            | 1029 / 298            |
+| Memory needed to store decoded wire (bytes / blocks)   | 0 / 0                 | 760 / 20              | 65689 / 40            | 328 / 1               |
+| Transient memory allocated during decode (KB)          | 0                     | 1                     | 131                   | 4                     |
+| Generated source code size (KB)                        | 4                     | 61                    | 0                     | 4                     |
+| Field access in handwritten traversal code             | accessors             | accessors             | manual error checking | accessors             |
+| Library source code (KB)                               | 15                    | some subset of 3800   | 87                    | 43                    |
+
+### Some other serialization systems we compared against but did not benchmark (yet), in rough order of applicability:
+
+-   Cap'n'Proto promises to reduce Protocol Buffers much like FlatBuffers does,
+    though with a more complicated binary encoding and less flexibility (no
+    optional fields to allow deprecating fields or serializing with missing
+    fields for which defaults exist).
+    It currently also isn't fully cross-platform portable (lack of VS support).
+-   msgpack: has very minimal forwards/backwards compatability support when used
+    with the typed C++ interface. Also lacks VS2010 support.
+-   Thrift: very similar to Protocol Buffers, but appears to be less efficient,
+    and have more dependencies.
+-   XML: typically even slower than JSON, but has the advantage that it can be
+    parsed with a schema to reduce error-checking boilerplate code.
+-   YAML: a superset of JSON and otherwise very similar. Used by e.g. Unity.
+-   C# comes with built-in serialization functionality, as used by Unity also.
+    Being tied to the language, and having no automatic versioning support
+    limits its applicability.
+-   Project Anarchy (the free mobile engine by Havok) comes with a serialization
+    system, that however does no automatic versioning (have to code around new
+    fields manually), is very much tied to the rest of the engine, and works
+    without a schema to generate code (tied to your C++ class definition).
+
--- a/docs/source/Building.md
+++ b/docs/source/Building.md
@@ -0,0 +1,45 @@
+# Building
+
+The system comes with a `cmake` file that should allow you to build the
+compiler `flatc` and the tests (optionally). For details on `cmake`, see
+<http://www.cmake.org>. In brief, depending on your platform, use one of
+e.g.:
+
+    cmake -G "Unix Makefiles"
+    cmake -G "Visual Studio 10"
+    cmake -G "Xcode"
+
+Then, build as normal for your platform. This should result in a `flatc`
+executable, essential for the next steps.
+Note that to use clang instead of gcc, you may need to set up your environment
+variables, e.g.
+`CC=/usr/bin/clang CXX=/usr/bin/clang++ cmake -G "Unix Makefiles"`.
+
+Optionally, run the `flattests` executable.
+to ensure everything is working correctly on your system. If this fails,
+please contact us!
+
+The cmake file will also build two sample executables, `sample_binary` and
+`sample_text`, see the corresponding `.cpp` file in the samples directory.
+
+There is an `android` directory that contains all you need to build the test
+executable on android (use the included `build_apk.sh` script, or use
+`ndk_build` / `adb` etc. as usual). Upon running, it will output to the log
+if tests succeeded or not.
+
+There is usually no runtime to compile, as the code consists of a single
+header, `include/flatbuffers/flatbuffers.h`. You should add the
+`include` folder to your include paths. If you wish to be
+able to load schemas and/or parse text into binary buffers at runtime,
+you additionally need the other headers in `include/flatbuffers`. You must
+also compile/link `src/idl_parser.cpp` (and `src/idl_gen_text.cpp` if you
+also want to be able convert binary to text).
+
+For applications on Google Play that integrate this library, usage is tracked.
+This tracking is done automatically using the embedded version string
+(flatbuffer_version_string), and helps us continue to optimize it.
+Aside from consuming a few extra bytes in your application binary, it shouldn't
+affect your application at all. We use this information to let us know if
+FlatBuffers is useful and if we should continue to invest in it. Since this is
+open source, you are free to remove the version string but we would appreciate
+if you would leave it in.
--- a/docs/source/Compiler.md
+++ b/docs/source/Compiler.md
@@ -0,0 +1,22 @@
+# Using the schema compiler
+
+Usage:
+
+    flatc [ -c ] [ -j ] [ -b ] [ -t ] file1 file2 ..
+
+The files are read and parsed in order, and can contain either schemas
+or data (see below). Later files can make use of definitions in earlier
+files. Depending on the flags passed, additional files may
+be generated for each file processed:
+
+-   `-c` : Generate a C++ header for all definitions in this file (as
+    `filename_generated.h`). Skips data.
+
+-   `-j` : Generate Java classes.
+
+-   `-b` : If data is contained in this file, generate a
+    `filename_wire.bin` containing the binary flatbuffer.
+
+-   `-t` : If data is contained in this file, generate a
+    `filename_wire.txt` (for debugging).
+
--- a/docs/source/CppUsage.md
+++ b/docs/source/CppUsage.md
@@ -0,0 +1,226 @@
+# Use in C++
+
+Assuming you have written a schema using the above language in say
+`mygame.fbs` (FlatBuffer Schema, though the extension doesn't matter),
+you've generated a C++ header called `mygame_generated.h` using the
+compiler (e.g. `flatc -c mygame.fbs`), you can now start using this in
+your program by including the header. As noted, this header relies on
+`flatbuffers/flatbuffers.h`, which should be in your include path.
+
+### Writing in C++
+
+To start creating a buffer, create an instance of `FlatBufferBuilder`
+which will contain the buffer as it grows:
+
+    FlatBufferBuilder fbb;
+
+Before we serialize a Monster, we need to first serialize any objects
+that are contained there-in, i.e. we serialize the data tree using
+depth first, pre-order traversal. This is generally easy to do on
+any tree structures. For example:
+
+    auto name = fbb.CreateString("MyMonster");
+
+    unsigned char inv[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
+    auto inventory = fbb.CreateVector(inv, 10);
+
+`CreateString` and `CreateVector` serialize these two built-in
+datatypes, and return offsets into the serialized data indicating where
+they are stored, such that `Monster` below can refer to them.
+
+`CreateString` can also take an `std::string`, or a `const char *` with
+an explicit length, and is suitable for holding UTF-8 and binary
+data if needed.
+
+`CreateVector` can also take an `std::vector`. The
+offset it returns is typed, i.e. can only be used to set fields of the
+correct type below. To create a vector of struct objects (which will
+be stored as contiguous memory in the buffer, use `CreateVectorOfStructs`
+instead.
+
+    Vec3 vec(1, 2, 3);
+
+`Vec3` is the first example of code from our generated
+header. Structs (unlike tables) translate to simple structs in C++, so
+we can construct them in a familiar way.
+
+We have now serialized the non-scalar components of of the monster
+example, so we could create the monster something like this:
+
+    auto mloc = CreateMonster(fbb, &vec, 150, 80, name, inventory, Color_Red, Offset<void>(0), Any_NONE);
+
+Note that we're passing `150` for the `mana` field, which happens to be the
+default value: this means the field will not actually be written to the buffer,
+since we'll get that value anyway when we query it. This is a nice space
+savings, since it is very common for fields to be at their default. It means
+we also don't need to be scared to add fields only used in a minority of cases,
+since they won't bloat up the buffer sizes if they're not actually used.
+
+We do something similarly for the union field `test` by specifying a `0` offset
+and the `NONE` enum value (part of every union) to indicate we don't actually
+want to write this field.
+
+Tables (like `Monster`) give you full flexibility on what fields you write
+(unlike `Vec3`, which always has all fields set because it is a `struct`).
+If you want even more control over this (i.e. skip fields even when they are
+not default), instead of the convenient `CreateMonster` call we can also
+build the object field-by-field manually:
+
+    MonsterBuilder mb(fbb);
+    mb.add_pos(&vec);
+    mb.add_hp(80);
+    mb.add_name(name);
+    mb.add_inventory(inventory);
+    auto mloc = mb.Finish();
+
+We start with a temporary helper class `MonsterBuilder` (which is
+defined in our generated code also), then call the various `add_`
+methods to set fields, and `Finish` to complete the object. This is
+pretty much the same code as you find inside `CreateMonster`, except
+we're leaving out a few fields. Fields may also be added in any order,
+though orderings with fields of the same size adjacent
+to each other most efficient in size, due to alignment. You should
+not nest these Builder classes (serialize your
+data in pre-order).
+
+Regardless of whether you used `CreateMonster` or `MonsterBuilder`, you
+now have an offset to the root of your data, and you can finish the
+buffer using:
+
+    fbb.Finish(mloc);
+
+The buffer is now ready to be stored somewhere, sent over the network,
+be compressed, or whatever you'd like to do with it. You can access the
+start of the buffer with `fbb.GetBufferPointer()`, and it's size from
+`fbb.GetSize()`.
+
+`samples/sample_binary.cpp` is a complete code sample similar to
+the code above, that also includes the reading code below.
+
+### Reading in C++
+
+If you've received a buffer from somewhere (disk, network, etc.) you can
+directly start traversing it using:
+
+    auto monster = GetMonster(buffer_pointer);
+
+`monster` is of type `Monster *`, and points to somewhere inside your
+buffer. If you look in your generated header, you'll see it has
+convenient accessors for all fields, e.g.
+
+    assert(monster->hp() == 80);
+    assert(monster->mana() == 150);  // default
+    assert(strcmp(monster->name()->c_str(), "MyMonster") == 0);
+
+These should all be true. Note that we never stored a `mana` value, so
+it will return the default.
+
+To access sub-objects, in this case the `Vec3`:
+
+    auto pos = monster->pos();
+    assert(pos);
+    assert(pos->z() == 3);
+
+If we had not set the `pos` field during serialization, it would be
+`NULL`.
+
+Similarly, we can access elements of the inventory array:
+
+    auto inv = monster->inventory();
+    assert(inv);
+    assert(inv->Get(9) == 9);
+
+### Direct memory access
+
+As you can see from the above examples, all elements in a buffer are
+accessed through generated accessors. This is because everything is
+stored in little endian format on all platforms (the accessor
+performs a swap operation on big endian machines), and also because
+the layout of things is generally not known to the user.
+
+For structs, layout is deterministic and guaranteed to be the same
+accross platforms (scalars are aligned to their
+own size, and structs themselves to their largest member), and you
+are allowed to access this memory directly by using `sizeof()` and
+`memcpy` on the pointer to a struct, or even an array of structs.
+
+To compute offsets to sub-elements of a struct, make sure they
+are a structs themselves, as then you can use the pointers to
+figure out the offset without having to hardcode it. This is
+handy for use of arrays of structs with calls like `glVertexAttribPointer`
+in OpenGL or similar APIs.
+
+It is important to note is that structs are still little endian on all
+machines, so only use tricks like this if you can guarantee you're not
+shipping on a big endian machine (an `assert(FLATBUFFERS_LITTLEENDIAN)`
+would be wise).
+
+## Text & schema parsing
+
+Using binary buffers with the generated header provides a super low
+overhead use of FlatBuffer data. There are, however, times when you want
+to use text formats, for example because it interacts better with source
+control, or you want to give your users easy access to data.
+
+Another reason might be that you already have a lot of data in JSON
+format, or a tool that generates JSON, and if you can write a schema for
+it, this will provide you an easy way to use that data directly.
+
+There are two ways to use text formats:
+
+### Using the compiler as a conversion tool
+
+This is the preferred path, as it doesn't require you to add any new
+code to your program, and is maximally efficient since you can ship with
+binary data. The disadvantage is that it is an extra step for your
+users/developers to perform, though you might be able to automate it.
+
+    flatc -b myschema.fbs mydata.json
+
+This will generate the binary file `mydata_wire.bin` which can be loaded
+as before.
+
+### Making your program capable of loading text directly
+
+This gives you maximum flexibility. You could even opt to support both,
+i.e. check for both files, and regenerate the binary from text when
+required, otherwise just load the binary.
+
+This option is currently only available for C++, or Java through JNI.
+
+As mentioned in the section "Building" above, this technique requires
+you to link a few more files into your program, and you'll want to include
+`flatbuffers/idl.h`.
+
+Load text (either a schema or json) into an in-memory buffer (there is a
+convenient `LoadFile()` utility function in `flatbuffers/util.h` if you
+wish). Construct a parser:
+
+    flatbuffers::Parser parser;
+
+Now you can parse any number of text files in sequence:
+
+    parser.Parse(text_file.c_str());
+
+This works similarly to how the command-line compiler works: a sequence
+of files parsed by the same `Parser` object allow later files to
+reference definitions in earlier files. Typically this means you first
+load a schema file (which populates `Parser` with definitions), followed
+by one or more JSON files.
+
+If there were any parsing errors, `Parse` will return `false`, and
+`Parser::err` contains a human readable error string with a line number
+etc, which you should present to the creator of that file.
+
+After each JSON file, the `Parser::fbb` member variable is the
+`FlatBufferBuilder` that contains the binary buffer version of that
+file, that you can access as described above.
+
+`samples/sample_text.cpp` is a code sample showing the above operations.
+
+### Threading
+
+None of the code is thread-safe, by design. That said, since currently a
+FlatBuffer is read-only and entirely `const`, reading by multiple threads
+is possible.
+
--- a/docs/source/FlatBuffers.md
+++ b/docs/source/FlatBuffers.md
@@ -0,0 +1,126 @@
+# FlatBuffers
+
+FlatBuffers is an efficient cross platform serialization library in for C++ and
+Java. It was created at Google specifically for game development and other
+performance-critical applications.
+
+It is available as open source under the Apache license, v2 (see LICENSE.txt).
+
+## Why use FlatBuffers?
+
+-   **Access to serialized data without parsing/unpacking** - What sets
+    FlatBuffers apart is that it represents hierarchical data in a flat
+    binary buffer in such a way that it can still be accessed directly
+    without parsing/unpacking, while also still supporting data
+    structure evolution (forwards/backwards compatibility).
+
+-   **Memory efficiency and speed** - The only memory needed to access
+    your data is that of the buffer. It requires 0 additional allocations.
+    FlatBuffers is also very
+    suitable for use with mmap (or streaming), requiring only part of the
+    buffer to be in memory. Access is close to the speed of raw
+    struct access with only one extra indirection (a kind of vtable) to
+    allow for format evolution and optional fields. It is aimed at
+    projects where spending time and space (many memory allocations) to
+    be able to access or construct serialized data is undesirable, such
+    as in games or any other performance sensitive applications. See the
+    [benchmarks](md__benchmarks.html) for details.
+
+-   **Flexible** - Optional fields means not only do you get great
+    forwards and backwards compatibility (increasingly important for
+    long-lived games: don't have to update all data with each new
+    version!). It also means you have a lot of choice in what data you
+    write and what data you don't, and how you design data structures.
+
+-   **Tiny code footprint** - Small amounts of generated code, and just
+    a single small header as the minimum dependency, which is very easy
+    to integrate. Again, see the benchmark section for details.
+
+-   **Strongly typed** - Errors happen at compile time rather than
+    manually having to write repetitive and error prone run-time checks.
+    Useful code can be generated for you.
+
+-   **Convenient to use** - Generated C++ code allows for terse access
+    & construction code. Then there's optional functionality for parsing
+    schemas and JSON-like text representations at runtime efficiently if
+    needed (faster and more memory efficient than other JSON
+    parsers).
+
+    Java code supports object-reuse.
+
+-   **Cross platform C++11/Java code with no dependencies** - will work with
+    any recent gcc/clang and VS2010. Comes with build files for the tests &
+    samples (Android .mk files, and cmake for all other platforms).
+
+### Why not use Protocol Buffers, or .. ?
+
+Protocol Buffers is indeed relatively similar to FlatBuffers,
+with the primary difference being that FlatBuffers does not need a parsing/
+unpacking step to a secondary representation before you can
+access data, often coupled with per-object memory allocation. The code
+is an order of magnitude bigger, too. Protocol Buffers has neither optional
+text import/export nor schema language features like unions.
+
+### But all the cool kids use JSON!
+
+JSON is very readable (which is why we use it as our optional text
+format) and very convenient when used together with dynamically typed
+languages (such as JavaScript). When serializing data from statically
+typed languages, however, JSON not only has the obvious drawback of runtime
+inefficiency, but also forces you to write *more* code to access data
+(counterintuitively) due to its dynamic-typing serialization system.
+In this context, it is only a better choice for systems that have very
+little to no information ahead of time about what data needs to be stored.
+
+Read more about the "why" of FlatBuffers in the
+[white paper](md__white_paper.html).
+
+## Usage in brief
+
+This section is a quick rundown of how to use this system. Subsequent
+sections provide a more in-depth usage guide.
+
+-   Write a schema file that allows you to define the data structures
+    you may want to serialize. Fields can have a scalar type
+    (ints/floats of all sizes), or they can be a: string; array of any type;
+    reference to yet another object; or, a set of possible objects (unions).
+    Fields are optional and have defaults, so they don't need to be
+    present for every object instance.
+
+-   Use `flatc` (the FlatBuffer compiler) to generate a C++ header (or Java
+    classes) with helper classes to access and construct serialized data. This
+    header (say `mydata_generated.h`) only depends on `flatbuffers.h`, which
+    defines the core functionality.
+
+-   Use the `FlatBufferBuilder` class to construct a flat binary buffer.
+    The generated functions allow you to add objects to this
+    buffer recursively, often as simply as making a single function call.
+
+-   Store or send your buffer somewhere!
+
+-   When reading it back, you can obtain the pointer to the root object
+    from the binary buffer, and from there traverse it conveniently
+    in-place with `object->field()`.
+
+## In-depth documentation
+
+-   How to [build the compiler](md__building.html) and samples on various
+    platforms.
+-   How to [use the compiler](md__compiler.html).
+-   How to [write a schema](md__schemas.html).
+-   How to [use the generated C++ code](md__cpp_usage.html) in your own
+    programs.
+-   How to [use the generated Java code](md__java_usage.html) in your own
+    programs.
+-   Some [benchmarks](md__benchmarks.html) showing the advantage of using
+    FlatBuffers.
+-   A [white paper](md__white_paper.html) explaining the "why" of FlatBuffers.
+-   A description of the [internals](md__internals.html) of FlatBuffers.
+-   A formal [grammar](md__grammar.html) of the schema language.
+
+## Online resources
+
+-   [github repository](http://github.com/google/flatbuffers)
+-   [landing page](http://google.github.io/flatbuffers)
+-   [FlatBuffers Google Group](http://group.google.com/group/flatbuffers)
+-   [FlatBuffers Issues Tracker](http://github.com/google/flatbuffers/issues)
--- a/docs/source/Grammar.md
+++ b/docs/source/Grammar.md
@@ -0,0 +1,30 @@
+# Formal Grammar of the schema language
+
+schema = namespace\_decl | type\_decl | enum\_decl | root\_decl | object
+
+namespace\_decl = `namespace` ident ( `.` ident )* `;`
+
+type\_decl = ( `table` | `struct` ) ident metadata `{` field\_decl+ `}`
+
+enum\_decl = ( `enum` | `union` ) ident [ `:` type ] metadata `{` commasep(
+enumval\_decl ) `}`
+
+root\_decl = `root_type` ident `;`
+
+field\_decl = type `:` ident [ `=` scalar ] metadata `;`
+
+type = `bool` | `byte` | `ubyte` | `short` | `ushort` | `int` | `uint` |
+`float` | `long` | `ulong` | `double`
+ | `string` | `[` type `]` | ident
+
+enumval\_decl = ident [ `=` integer\_constant ]
+
+metadata = [ `(` commasep( ident [ `:` scalar ] ) `)` ]
+
+scalar = integer\_constant | float\_constant | `true` | `false`
+
+object = { commasep( ident `:` value ) }
+
+value = scalar | object | string\_constant | `[` commasep( value ) `]`
+
+commasep(x) = [ x ( `,` x )\* ]
--- a/docs/source/Internals.md
+++ b/docs/source/Internals.md
@@ -0,0 +1,244 @@
+# FlatBuffer Internals
+
+This section is entirely optional for the use of FlatBuffers. In normal
+usage, you should never need the information contained herein. If you're
+interested however, it should give you more of an appreciation of why
+FlatBuffers is both efficient and convenient.
+
+### Format components
+
+A FlatBuffer is a binary file and in-memory format consisting mostly of
+scalars of various sizes, all aligned to their own size. Each scalar is
+also always represented in little-endian format, as this corresponds to
+all commonly used CPUs today. FlatBuffers will also work on big-endian
+machines, but will be slightly slower because of additional
+byte-swap intrinsics.
+
+On purpose, the format leaves a lot of details about where exactly
+things live in memory undefined, e.g. fields in a table can have any
+order, and objects to some extend can be stored in many orders. This is
+because the format doesn't need this information to be efficient, and it
+leaves room for optimization and extension (for example, fields can be
+packed in a way that is most compact). Instead, the format is defined in
+terms of offsets and adjacency only.
+
+### Format identification
+
+The format also doesn't contain information for format identification
+and versioning, which is also by design. FlatBuffers is a statically typed
+system, meaning the user of a buffer needs to know what kind of buffer
+it is. FlatBuffers can of course be wrapped inside other containers
+where needed, or you can use its union feature to dynamically identify
+multiple possible sub-objects stored. Additionally, it can be used
+together with the schema parser if full reflective capabilities are
+desired.
+
+Versioning is something that is intrinsically part of the format (the
+optionality / extensibility of fields), so the format itself does not
+need a version number (it's a meta-format, in a sense). We're hoping
+that this format can accommodate all data needed. If format breaking
+changes are ever necessary, it would become a new kind of format rather
+than just a variation.
+
+### Offsets
+
+The most important and generic offset type (see `flatbuffers.h`) is
+`offset_t`, which is currently always a `uint32_t`, and is used to
+refer to all tables/unions/strings/vectors. 32bit is
+intentional, since we want to keep the format binary compatible between
+32 and 64bit systems, and a 64bit offset would bloat the size for almost
+all uses. A version of this format with 64bit (or 16bit) offsets is easy to set
+when needed. Unsigned means they can only point in one direction, which
+typically is forward (towards a higher memory location). Any backwards
+offsets will be explicitly marked as such.
+
+The format starts with an `offset_t` to the root object in the buffer.
+
+We have two kinds of objects, structs and tables.
+
+### Structs
+
+These are the simplest, and as mentioned, intended for simple data that
+benefits from being extra efficient and doesn't need versioning /
+extensibility. They are always stored inline in their parent (a struct,
+table, or vector) for maximum compactness. Structs define a consistent
+memory layout where all components are aligned to their size, and
+structs aligned to their largest scalar member. This is done independent
+of the alignment rules of the underlying compiler to guarantee a cross
+platform compatible layout. This layout is then enforced in the generated
+code.
+
+### Tables
+
+These start with an `soffset_t` to a vtable (signed version of
+`offset_t`, since vtables may be stored anywhere), followed by all the
+fields as aligned scalars. Unlike structs, not all fields need to be
+present. There is no set order and layout.
+
+To be able to access fields regardless of these uncertainties, we go
+through a vtable of offsets. Vtables are shared between any objects that
+happen to have the same vtable values.
+
+The elements of a vtable are all of type `voffset_t`, which is currently
+a `uint16_t`. The first element is the number of elements of the vtable,
+including this one. The second one is the size of the object, in bytes
+(including the vtable offset). This size is used for streaming, to know
+how many bytes to read to be able to access all fields of the object.
+The remaining elements are N the offsets, where N is the amount of field
+declared in the schema when the code that constructed this buffer was
+compiled (thus, the size of the table is N + 2).
+
+All accessor functions in the generated code for tables contain the
+offset into this table as a constant. This offset is checked against the
+first field (the number of elements), to protect against newer code
+reading older data. If this offset is out of range, or the vtable entry
+is 0, that means the field is not present in this object, and the
+default value is return. Otherwise, the entry is used as offset to the
+field to be read.
+
+### Strings and Vectors
+
+Strings are simply a vector of bytes, and are always
+null-terminated. Vectors are stored as contiguous aligned scalar
+elements prefixed by a count.
+
+### Construction
+
+The current implementation constructs these buffers backwards, since
+that significantly reduces the amount of bookkeeping and simplifies the
+construction API.
+
+### Code example
+
+Here's an example of the code that gets generated for the `samples/monster.fbs`.
+What follows is the entire file, broken up by comments:
+
+    // automatically generated, do not modify
+
+    #include "flatbuffers/flatbuffers.h"
+
+    namespace MyGame {
+    namespace Sample {
+
+Nested namespace support.
+
+    enum {
+      Color_Red = 0,
+      Color_Green = 1,
+      Color_Blue = 2,
+    };
+
+    inline const char **EnumNamesColor() {
+      static const char *names[] = { "Red", "Green", "Blue", nullptr };
+      return names;
+    }
+
+    inline const char *EnumNameColor(int e) { return EnumNamesColor()[e]; }
+
+Enums and convenient reverse lookup.
+
+    enum {
+      Any_NONE = 0,
+      Any_Monster = 1,
+    };
+
+    inline const char **EnumNamesAny() {
+      static const char *names[] = { "NONE", "Monster", nullptr };
+      return names;
+    }
+
+    inline const char *EnumNameAny(int e) { return EnumNamesAny()[e]; }
+
+Unions share a lot with enums.
+
+    struct Vec3;
+    struct Monster;
+
+Predeclare all datatypes since there may be circular references.
+
+    MANUALLY_ALIGNED_STRUCT(4) Vec3 {
+     private:
+      float x_;
+      float y_;
+      float z_;
+
+     public:
+      Vec3(float x, float y, float z)
+        : x_(flatbuffers::EndianScalar(x)), y_(flatbuffers::EndianScalar(y)), z_(flatbuffers::EndianScalar(z)) {}
+
+      float x() const { return flatbuffers::EndianScalar(x_); }
+      float y() const { return flatbuffers::EndianScalar(y_); }
+      float z() const { return flatbuffers::EndianScalar(z_); }
+    };
+    STRUCT_END(Vec3, 12);
+
+These ugly macros do a couple of things: they turn off any padding the compiler
+might normally do, since we add padding manually (though none in this example),
+and they enforce alignment chosen by FlatBuffers. This ensures the layout of
+this struct will look the same regardless of compiler and platform. Note that
+the fields are private: this is because these store little endian scalars
+regardless of platform (since this is part of the serialized data).
+`EndianScalar` then converts back and forth, which is a no-op on all current
+mobile and desktop platforms, and a single machine instruction on the few
+remaining big endian platforms.
+
+    struct Monster : private flatbuffers::Table {
+      const Vec3 *pos() const { return GetStruct<const Vec3 *>(4); }
+      int16_t mana() const { return GetField<int16_t>(6, 150); }
+      int16_t hp() const { return GetField<int16_t>(8, 100); }
+      const flatbuffers::String *name() const { return GetPointer<const flatbuffers::String *>(10); }
+      const flatbuffers::Vector<uint8_t> *inventory() const { return GetPointer<const flatbuffers::Vector<uint8_t> *>(14); }
+      int8_t color() const { return GetField<int8_t>(16, 2); }
+    };
+
+Tables are a bit more complicated. A table accessor struct is used to point at
+the serialized data for a table, which always starts with an offset to its
+vtable. It derives from `Table`, which contains the `GetField` helper functions.
+GetField takes a vtable offset, and a default value. It will look in the vtable
+at that offset. If the offset is out of bounds (data from an older version) or
+the vtable entry is 0, the field is not present and the default is returned.
+Otherwise, it uses the entry as an offset into the table to locate the field.
+
+    struct MonsterBuilder {
+      flatbuffers::FlatBufferBuilder &fbb_;
+      flatbuffers::uoffset_t start_;
+      void add_pos(const Vec3 *pos) { fbb_.AddStruct(4, pos); }
+      void add_mana(int16_t mana) { fbb_.AddElement<int16_t>(6, mana, 150); }
+      void add_hp(int16_t hp) { fbb_.AddElement<int16_t>(8, hp, 100); }
+      void add_name(flatbuffers::Offset<flatbuffers::String> name) { fbb_.AddOffset(10, name); }
+      void add_inventory(flatbuffers::Offset<flatbuffers::Vector<uint8_t>> inventory) { fbb_.AddOffset(14, inventory); }
+      void add_color(int8_t color) { fbb_.AddElement<int8_t>(16, color, 2); }
+      MonsterBuilder(flatbuffers::FlatBufferBuilder &_fbb) : fbb_(_fbb) { start_ = fbb_.StartTable(); }
+      flatbuffers::Offset<Monster> Finish() { return flatbuffers::Offset<Monster>(fbb_.EndTable(start_, 7)); }
+    };
+
+`MonsterBuilder` is the base helper struct to construct a table using a
+`FlatBufferBuilder`. You can add the fields in any order, and the `Finish`
+call will ensure the correct vtable gets generated.
+
+    inline flatbuffers::Offset<Monster> CreateMonster(flatbuffers::FlatBufferBuilder &_fbb, const Vec3 *pos, int16_t mana, int16_t hp, flatbuffers::Offset<flatbuffers::String> name, flatbuffers::Offset<flatbuffers::Vector<uint8_t>> inventory, int8_t color) {
+      MonsterBuilder builder_(_fbb);
+      builder_.add_inventory(inventory);
+      builder_.add_name(name);
+      builder_.add_pos(pos);
+      builder_.add_hp(hp);
+      builder_.add_mana(mana);
+      builder_.add_color(color);
+      return builder_.Finish();
+    }
+
+`CreateMonster` is a convenience function that calls all functions in
+`MonsterBuilder` above for you. Note that if you pass values which are
+defaults as arguments, it will not actually construct that field, so
+you can probably use this function instead of the builder class in
+almost all cases.
+
+    inline const Monster *GetMonster(const void *buf) { return flatbuffers::GetRoot<Monster>(buf); }
+
+This function is only generated for the root table type, to be able to
+start traversing a FlatBuffer from a raw buffer pointer.
+
+    }; // namespace MyGame
+    }; // namespace Sample
+
+
--- a/docs/source/JavaUsage.md
+++ b/docs/source/JavaUsage.md
@@ -0,0 +1,79 @@
+# Use in Java
+
+There's experimental support for reading FlatBuffers in Java. Generate code
+for Java with the `-j` option to `flatc`.
+
+See `javaTest.java` for an example. Essentially, you read a FlatBuffer binary
+file into a `byte[]`, which you then turn into a `ByteBuffer`, which you pass to
+the `getRootAsMonster` function:
+
+    ByteBuffer bb = ByteBuffer.wrap(data);
+    Monster monster = Monster.getRootAsMonster(bb);
+
+Now you can access values much like C++:
+
+    short hp = monster.hp();
+    Vec3 pos = monster.pos();
+
+Note that whenever you access a new object like in the `pos` example above,
+a new temporary accessor object gets created. If your code is very performance
+sensitive (you iterate through a lot of objects), there's a second `pos()`
+method to which you can pass a `Vec3` object you've already created. This allows
+you to reuse it across many calls and reduce the amount of object allocation (and
+thus garbage collection) your program does.
+
+Sadly the string accessors currently always create a new string when accessed,
+since FlatBuffer's UTF-8 strings can't be read in-place by Java.
+
+Vector access is also a bit different from C++: you pass an extra index
+to the vector field accessor. Then a second method with the same name
+suffixed by `_length` let's you know the number of elements you can access:
+
+    for (int i = 0; i < monster.inventory_length(); i++)
+        monster.inventory(i); // do something here
+
+You can also construct these buffers in Java using the static methods found
+in the generated code, and the FlatBufferBuilder class:
+
+    FlatBufferBuilder fbb = new FlatBufferBuilder();
+
+Create strings:
+
+    int str = fbb.createString("MyMonster");
+
+Create a table with a struct contained therein:
+
+    Monster.startMonster(fbb);
+    Monster.addPos(fbb, Vec3.createVec3(fbb, 1.0f, 2.0f, 3.0f, 3.0, (byte)4, (short)5, (byte)6));
+    Monster.addHp(fbb, (short)80);
+    Monster.addName(fbb, str);
+    Monster.addInventory(fbb, inv);
+    Monster.addTest_type(fbb, (byte)1);
+    Monster.addTest(fbb, mon2);
+    Monster.addTest4(fbb, test4s);
+    int mon = Monster.endMonster(fbb);
+
+As you can see, the Java code for tables does not use a convenient
+`createMonster` call like the C++ code. This is to create the buffer without
+using temporary object allocation (since the `Vec3` is an inline component of
+`Monster`, it has to be created right where it is added, whereas the name and
+the inventory are not inline).
+Structs do have convenient methods that even have arguments for nested structs.
+
+Vectors also use this start/end pattern to allow vectors of both scalar types
+and structs:
+
+    Monster.startInventoryVector(fbb, 5);
+    for (byte i = 4; i >=0; i--) fbb.addByte(i);
+    int inv = fbb.endVector();
+
+You can use the generated method `startInventoryVector` to conveniently call
+`startVector` with the right element size. You pass the number of
+elements you want to write. You write the elements backwards since the buffer
+is being constructed back to front.
+
+## Text Parsing
+
+There currently is no support for parsing text (Schema's and JSON) directly
+from Java, though you could use the C++ parser through JNI. Please see the
+C++ documentation for more on text parsing.
--- a/docs/source/Schemas.md
+++ b/docs/source/Schemas.md
@@ -0,0 +1,198 @@
+# Writing a schema
+
+The syntax of the schema language (aka IDL, Interface Definition
+Language) should look quite familiar to users of any of the C family of
+languages, and also to users of other IDLs. Let's look at an example
+first:
+
+    // example IDL file
+
+    namespace MyGame;
+
+    enum Color : byte { Red = 1, Green, Blue }
+
+    union Any { Monster, Weapon, Pickup }
+
+    struct Vec3 {
+      x:float;
+      y:float;
+      z:float;
+    }
+
+    table Monster {
+      pos:Vec3;
+      mana:short = 150;
+      hp:short = 100;
+      name:string;
+      friendly:bool = false (deprecated, priority: 1);
+      inventory:[ubyte];
+      color:Color = Blue;
+      test:Any;
+    }
+
+    root_type Monster;
+
+(Weapon & Pickup not defined as part of this example).
+
+### Tables
+
+Tables are the main way of defining objects in FlatBuffers, and consist
+of a name (here `Monster`) and a list of fields. Each field has a name,
+a type, and optionally a default value (if omitted, it defaults to 0 /
+NULL).
+
+Each field is optional: It does not have to appear in the wire
+representation, and you can choose to omit fields for each individual
+object. As a result, you have the flexibility to add fields without fear of
+bloating your data. This design is also FlatBuffer's mechanism for forward
+and backwards compatibility. Note that:
+
+-   You can add new fields in the schema ONLY at the end of a table
+    definition. Older data will still
+    read correctly, and give you the default value when read. Older code
+    will simply ignore the new field.
+
+-   You cannot delete fields you don't use anymore from the schema,
+    but you can simply
+    stop writing them into your data for almost the same effect.
+    Additionally you can mark them as `deprecated` as in the example
+    above, which will prevent the generation of accessors in the
+    generated C++, as a way to enforce the field not being used any more.
+    (careful: this may break code!).
+
+-   You may change field names and table names, if you're ok with your
+    code breaking until you've renamed them there too.
+
+
+
+### Structs
+
+Similar to a table, only now none of the fields are optional (so no defaults
+either), and fields may not be added or be deprecated. Structs may only contain
+scalars or other structs. Use this for
+simple objects where you are very sure no changes will ever be made
+(as quite clear in the example `Vec3`). Structs use less memory than
+tables and are even faster to access (they are always stored in-line in their
+parent object, and use no virtual table).
+
+### Types
+
+Builtin scalar types are:
+
+-   8 bit: `byte ubyte bool`
+
+-   16 bit: `short ushort`
+
+-   32 bit: `int uint float`
+
+-   64 bit: `long ulong double`
+
+-   Vector of any other type (denoted with `[type]`). Nesting vectors
+    require you wrap the inner vector in a struct/table rather than
+    writing `[[type]]`.
+
+-   `string`, which may only hold UTF-8 or 7-bit ASCII. For other text encodings
+    or general binary data use vectors (`[byte]` or `[ubyte]`) instead.
+
+-   References to other tables or structs, enums or unions (see
+    below).
+
+You can't change types of fields once they're used, with the exception
+of same-size data where a `reinterpret_cast` would give you a desirable result,
+e.g. you could change a `uint` to an `int` if no values in current data use the
+high bit yet.
+
+### (Default) Values
+
+Values are a sequence of digits, optionally followed by a `.` and more digits
+for float constants, and optionally prefixed by a `-`. Non-scalar defaults are
+currently not supported (always NULL).
+
+You generally do not want to change default values after they're initially
+defined. Fields that have the default value are not actually stored in the
+serialized data but are generated in code, so when you change the default, you'd
+now get a different value than from code generated from an older version of
+the schema. There are situations however where this may be
+desirable, especially if you can ensure a simultaneous rebuild of
+all code.
+
+### Enums
+
+Define a sequence of named constants, each with a given value, or
+increasing by one from the previous one. The default first value
+is `0`. As you can see in the enum declaration, you specify the underlying
+integral type of the enum with `:` (in this case `byte`), which then determines
+the type of any fields declared with this enum type. If you omit the underlying
+type, it will be `short`.
+
+### Unions
+
+Unions share a lot of properties with enums, but instead of new names
+for constants, you use names of tables. You can then declare
+a union field which can hold a reference to any of those types, and
+additionally a hidden field with the suffix `_type` is generated that
+holds the corresponding enum value, allowing you to know which type to
+cast to at runtime.
+
+### Namespaces
+
+These will generate the corresponding namespace in C++ for all helper
+code, and packages in Java. You can use `.` to specify nested namespaces /
+packages.
+
+### Root type
+
+This declares what you consider to be the root table (or struct) of the
+serialized data.
+
+### Comments & documentation
+
+May be written as in most C-based languages. Additionally, a triple
+comment (`///`) on a line by itself signals that a comment is documentation
+for whatever is declared on the line after it
+(table/struct/field/enum/union/element), and the comment is output
+in the corresponding C++ code. Multiple such lines per item are allowed.
+
+### Attributes
+
+Attributes may be attached to a declaration, behind a field, or after
+the name of a table/struct/enum/union. These may either have a value or
+not. Some attributes like `deprecated` are understood by the compiler,
+others are simply ignored (like `priority`), but are available to query
+if you parse the schema at runtime.
+This is useful if you write your own code generators/editors etc., and
+you wish to add additional information specific to your tool (such as a
+help text).
+
+Current understood attributes:
+
+-   `deprecated` (on a field): do not generate accessors for this field
+    anymore, code should stop using this data.
+-   `original_order` (on a table): since elements in a table do not need
+    to be stored in any particular order, they are often optimized for
+    space by sorting them to size. This attribute stops that from happening.
+-   `force_align: size` (on a struct): force the alignment of this struct
+    to be something higher than what it is naturally aligned to. Causes
+    these structs to be aligned to that amount inside a buffer, IF that
+    buffer is allocated with that alignment (which is not necessarily
+    the case for buffers accessed directly inside a `FlatBufferBuilder`).
+
+## Gotchas
+
+### Schemas and version control
+
+FlatBuffers relies on new field declarations being added at the end, and earlier
+declarations to not be removed, but be marked deprecated when needed. We think
+this is an improvement over the manual number assignment that happens in
+Protocol Buffers.
+
+One place where this is possibly problematic however is source control. If user
+A adds a field, generates new binary data with this new schema, then tries to
+commit both to source control after user B already committed a new field also,
+and just auto-merges the schema, the binary files are now invalid compared to
+the new schema.
+
+The solution of course is that you should not be generating binary data before
+your schema changes have been committed, ensuring consistency with the rest of
+the world.
+
--- a/docs/source/WhitePaper.md
+++ b/docs/source/WhitePaper.md
@@ -0,0 +1,127 @@
+# FlatBuffers white paper
+
+This document tries to shed some light on to the "why" of FlatBuffers, a
+new serialization library.
+
+## Motivation
+
+Back in the good old days, performance was all about instructions and
+cycles. Nowadays, processing units have run so far ahead of the memory
+subsystem, that making an efficient application should start and finish
+with thinking about memory. How much you use of it. How you lay it out
+and access it. How you allocate it. When you copy it.
+
+Serialization is a pervasive activity in a lot programs, and a common
+source of memory inefficiency, with lots of temporary data structures
+needed to parse and represent data, and inefficient allocation patterns
+and locality.
+
+If it would be possible to do serialization with no temporary objects,
+no additional allocation, no copying, and good locality, this could be
+of great value. The reason serialization systems usually don't manage
+this is because it goes counter to forwards/backwards compatability, and
+platform specifics like endianness and alignment.
+
+FlatBuffers is what you get if you try anyway.
+
+In particular, FlatBuffers focus is on mobile hardware (where memory
+size and memory bandwidth is even more constrained than on desktop
+hardware), and applications that have the highest performance needs:
+games.
+
+## FlatBuffers
+
+*This is a summary of FlatBuffers functionality, with some rationale.
+A more detailed description can be found in the FlatBuffers
+documentation.*
+
+### Summary
+
+A FlatBuffer is a binary buffer containing nested objects (structs,
+tables, vectors,..) organized using offsets so that the data can be
+traversed in-place just like any pointer-based data structure. Unlike
+most in-memory data structures however, it uses strict rules of
+alignment and endianness (always little) to ensure these buffers are
+cross platform. Additionally, for objects that are tables, FlatBuffers
+provides forwards/backwards compatibility and general optionality of
+fields, to support most forms of format evolution.
+
+You define your object types in a schema, which can then be compiled to
+C++ or Java for low to zero overhead reading & writing.
+Optionally, JSON data can be dynamically parsed into buffers.
+
+### Tables
+
+Tables are the cornerstone of FlatBuffers, since format evolution is
+essential for most applications of serialization. Typically, dealing
+with format changes is something that can be done transparently during
+the parsing process of most serialization solutions out there.
+But a FlatBuffer isn't parsed before it is accessed.
+
+Tables get around this by using an extra indirection to access fields,
+through a *vtable*. Each table comes with a vtable (which may be shared
+between multiple tables with the same layout), and contains information
+where fields for this particular kind of instance of vtable are stored.
+The vtable may also indicate that the field is not present (because this
+FlatBuffer was written with an older version of the software, of simply
+because the information was not necessary for this instance, or deemed
+deprecated), in which case a default value is returned.
+
+Tables have a low overhead in memory (since vtables are small and
+shared) and in access cost (an extra indirection), but provide great
+flexibility. Tables may even cost less memory than the equivalent
+struct, since fields do not need to be stored when they are equal to
+their default.
+
+FlatBuffers additionally offers "naked" structs, which do not offer
+forwards/backwards compatibility, but can be even smaller (useful for
+very small objects that are unlikely to change, like e.g. a coordinate
+pair or a RGBA color).
+
+### Schemas
+
+While schemas reduce some generality (you can't just read any data
+without having its schema), they have a lot of upsides:
+
+-   Most information about the format can be factored into the generated
+    code, reducing memory needed to store data, and time to access it.
+
+-   The strong typing of the data definitions means less error
+    checking/handling at runtime (less can go wrong).
+
+-   A schema enables us to access a buffer without parsing.
+
+FlatBuffer schemas are fairly similar to those of the incumbent,
+Protocol Buffers, and generally should be readable to those familiar
+with the C family of languages. We chose to improve upon the features
+offered by .proto files in the following ways:
+
+-   Deprecation of fields instead of manual field id assignment.
+    Extending an object in a .proto means hunting for a free slot among
+    the numbers (preferring lower numbers since they have a more compact
+    representation). Besides being inconvenient, it also makes removing
+    fields problematic: you either have to keep them, not making it
+    obvious that this field shouldn't be read/written anymore, and still
+    generating accessors. Or you remove it, but now you risk that
+    there's still old data around that uses that field by the time
+    someone reuses that field id, with nasty consequences.
+
+-   Differentiating between tables and structs (see above). Effectively
+    all table fields are `optional`, and all struct fields are
+    `required`.
+
+-   Having a native vector type instead of `repeated`. This gives you a
+    length without having to collect all items, and in the case of
+    scalars provides for a more compact representation, and one that
+    guarantees adjacency.
+
+-   Having a native `union` type instead of using a series of `optional`
+    fields, all of which must be checked individually.
+
+-   Being able to define defaults for all scalars, instead of having to
+    deal with their optionality at each access.
+
+-   A parser that can deal with both schemas and data definitions (JSON
+    compatible) uniformly.
+
+
--- a/docs/source/doxyfile
+++ b/docs/source/doxyfile