Add new Docs source files (#8461)

2026-06-10 07:06:26 +00:00 · 2024-12-23 15:55:56 -08:00
parent c75a0154eb
commit 492475a1b2
10 changed files with 2255 additions and 0 deletions
--- a/docs/source/schema.md
+++ b/docs/source/schema.md
@@ -0,0 +1,650 @@
+# Schema
+
+The syntax of the schema language (aka IDL,
+[Interface Definition Language](https://en.wikipedia.org/wiki/Interface_description_language))
+should look quite familiar to users of any of the C family of languages, and
+also to users of other IDLs. Let's look at an example first:
+
+```c title="monster.fbs" linenums="1"
+// example IDL file
+
+namespace MyGame;
+
+attribute "priority";
+
+enum Color : byte { Red = 1, Green, Blue }
+
+union Any { Monster, Weapon, Pickup }
+
+struct Vec3 {
+  x:float;
+  y:float;
+  z:float;
+}
+
+table Monster {
+  pos:Vec3;
+  mana:short = 150;
+  hp:short = 100;
+  name:string;
+  friendly:bool = false (deprecated, priority: 1);
+  inventory:[ubyte];
+  color:Color = Blue;
+  test:Any;
+}
+
+table Weapon {}
+table Pickup {}
+
+root_type Monster;
+```
+
+## Tables
+
+Tables are the main way of defining objects in FlatBuffers.
+
+```c title="monster.fbs - Example Table" linenums="17"
+table Monster {
+  pos:Vec3;
+  mana:short = 150;
+  hp:short = 100;
+  name:string;
+  friendly:bool = false (deprecated, priority: 1);
+  inventory:[ubyte];
+  color:Color = Blue;
+  test:Any;
+}
+```
+
+They consist of a name (here `Monster`) and a list of [fields](#fields). This
+field list can be appended to (and deprecated from) while still maintaining
+compatibility.
+
+### Fields
+
+Table fields have a name identifier, a [type](#types), optional default value,
+optional [attributes](#attributes) and ends with a `;`. See the
+[grammer](grammer.md) for full details.
+
+```ebnf
+field_decl = ident `:` type [ `=` scalar ] metadata `;`
+```
+
+Fields do not have to appear in the wire representation, and you can choose to
+omit fields when constructing an object. You have the flexibility to add fields
+without fear of bloating your data. This design is also FlatBuffer's mechanism
+for forward and backwards compatibility.
+
+There are three, mutually exclusive, reactions to the non-presence of a table's
+field in the binary data.
+
+#### 1. Default
+
+Default value fields with return the default value as defined in the schema. If
+the default value is not specified in the schema, it will be `0` for scalar
+types, or `null` for other types.
+
+```c++
+mana:short = 150;
+hp:short;
+inventory:[ubyte];
+```
+
+Here `mana` would default to the value `150`, `hp` to value `0`, and `inventory`
+to `null`, if those fields are not set.
+
+Only scalar values can have explicit defaults, non-scalar fields (strings,
+vectors, tables) are `null` when not present.
+
+This is the normal mode that fields will take.
+
+??? danger "Don't change Default values"
+
+    You generally do not want to change default values after they're initially
+    defined. Fields that have the default value are not actually stored in the
+    serialized data (see also Gotchas below). Values explicitly written by code
+    generated by the old schema old version, if they happen to be the default, will
+    be read as a different value by code generated with the new schema. This is
+    slightly less bad when converting an optional scalar into a default valued
+    scalar since non-presence would not be overloaded with a previous default value.
+    There are situations, however, where this may be desirable, especially if you
+    can ensure a simultaneous rebuild of all code.
+
+#### 2. Optional
+
+Optional value fields will return some form of `null` in the language generated.
+
+=== "C++"
+
+    ```c++
+    std::optional<T> field;
+    ```
+
+For optional scalars, just set the field default value to `null`. If the
+producer of the buffer does not explicitly set that field, it will be marked
+`null`.
+
+```c++
+  hp:short = null;
+```
+
+!!! note
+
+    Not every languages support scalar defaults yet
+
+#### 3. Required
+
+Required valued fields will cause an error if they are not set. The FlatBuffers
+verifier would consider the whole buffer invalid.
+
+This is enabled by the [`required` attribute](#required-1) on the field.
+
+```
+  hp:short (required)
+```
+
+You cannot have `required` set with an explicit default value, it will result in
+a compiler error.
+
+## Structs
+
+Similar to a table, `structs` consist of fields are required (so no defaults
+either), and fields may not be added or be deprecated.
+
+```c title="monster.fbs - Example Struct" linenums="11"
+struct Vec3 {
+  x:float;
+  y:float;
+  z:float;
+}
+```
+
+Structs may only contain scalars or other structs. Use this for simple objects
+where you are very sure no changes will ever be made (as quite clear in the
+example `Vec3`). Structs use less memory than tables and are even faster to
+access (they are always stored in-line in their parent object, and use no
+virtual table).
+
+### Arrays
+
+Arrays are a convenience short-hand for a fixed-length collection of elements.
+Arrays allow the following syntax, while maintaining binary equivalency.
+
+<div class="grid cards" markdown>
+
+- **Normal Syntax**
+
+  ===
+
+  ```c++
+  struct Vec3 {
+    x:float;
+    y:float;
+    z:float;
+  }
+  ```
+
+- **Array Syntax**
+
+  ===
+
+  ```c++
+  struct Vec3 {
+    v:[float:3];
+  }
+  ```
+
+</div>
+
+Arrays are currently only supported in a `struct`.
+
+## Types
+
+The following are the built-in types that can be used in FlatBuffers.
+
+### Scalars
+
+The standard assortment of fixed sized scalars are available. There are no
+variable sized integers (e.g., `varints`).
+
+| Size   | Signed            | Unsigned            | Floating Point       |
+| ------ | ----------------- | ------------------- | -------------------- |
+| 8-bit  | `byte`, `bool`    | `ubyte`             |                      |
+| 16-bit | `short` (`int16`) | `ushort` (`uint16`) |
+| 32-bit | `int` (`int32`)   | `uint` (`uint32`)   | `float` (`float32`)  |
+| 64-bit | `long` (`int64`)  | `ulong` (`uint64`)  | `double` (`float64`) |
+
+!!! note "Alias Types"
+
+    The type names in parentheses are alias names such that for example `uint8`
+    can be used in place of `ubyte`, and `int32` can be used in place of `int`
+    without affecting code generation.
+
+### Non-scalars
+
+#### Vectors
+
+Vector of any other type (denoted with `[type]`).
+
+```c++
+inventory:[ubyte];
+```
+
+!!! note "Nesting vectors"
+
+    Nesting vectors is not supported, instead you can wrap the inner vector with
+    a table.
+
+    ```
+    table nest{
+       a:[ubyte]
+    }
+
+    table monster {
+     a:[nest]
+    }
+    ```
+
+#### Strings
+
+Strings (indicated by `string`) are zero-terminated strings, prefixed by their
+length. Strings may only hold UTF-8 or 7-bit ASCII. For other text encodings or
+general binary data use vectors (`[byte]` or `[ubyte]`) instead.
+
+```c++
+name:string;
+```
+
+## Enums
+
+Define a sequence of named constants, each with a given value, or increasing by
+one from the previous one. The default first value is `0`. As you can see in the
+enum declaration, you specify the underlying integral type of the enum with `:`
+(in this case `byte`), which then determines the type of any fields declared
+with this enum type.
+
+Only integer types are allowed, i.e. `byte`, `ubyte`, `short` `ushort`, `int`,
+`uint`, `long` and `ulong`.
+
+Typically, enum values should only ever be added, never removed (there is no
+deprecation for enums). This requires code to handle forwards compatibility
+itself, by handling unknown enum values.
+
+## Unions
+
+Unions share a lot of properties with enums, but instead of new names for
+constants, you use names of tables. You can then declare a union field, which
+can hold a reference to any of those types, and additionally a field with the
+suffix `_type` is generated that holds the corresponding enum value, allowing
+you to know which type to cast to at runtime.
+
+It's possible to give an alias name to a type union. This way a type can even be
+used to mean different things depending on the name used:
+
+```txt
+table PointPosition { x:uint; y:uint; }
+table MarkerPosition {}
+union Position {
+  Start:MarkerPosition,
+  Point:PointPosition,
+  Finish:MarkerPosition
+}
+```
+
+Unions contain a special `NONE` marker to denote that no value is stored so that
+name cannot be used as an alias.
+
+Unions are a good way to be able to send multiple message types as a FlatBuffer.
+Note that because a union field is really two fields, it must always be part of
+a table, it cannot be the root of a FlatBuffer by itself.
+
+If you have a need to distinguish between different FlatBuffers in a more
+open-ended way, for example for use as files, see the file identification
+feature below.
+
+There is an experimental support only in C++ for a vector of unions (and types).
+In the example IDL file above, use [Any] to add a vector of Any to Monster
+table. There is also experimental support for other types besides tables in
+unions, in particular structs and strings. There's no direct support for scalars
+in unions, but they can be wrapped in a struct at no space cost.
+
+## Namespaces
+
+These will generate the corresponding namespace in C++ for all helper code, and
+packages in Java. You can use `.` to specify nested namespaces / packages.
+
+## Includes
+
+You can include other schemas files in your current one, e.g.:
+
+```txt
+include "mydefinitions.fbs";
+```
+
+This makes it easier to refer to types defined elsewhere. `include`
+automatically ensures each file is parsed just once, even when referred to more
+than once.
+
+When using the `flatc` compiler to generate code for schema definitions, only
+definitions in the current file will be generated, not those from the included
+files (those you still generate separately).
+
+## Root type
+
+This declares what you consider to be the root table of the serialized data.
+This is particularly important for parsing JSON data, which doesn't include
+object type information.
+
+## File identification and extension
+
+Typically, a FlatBuffer binary buffer is not self-describing, i.e. it needs you
+to know its schema to parse it correctly. But if you want to use a FlatBuffer as
+a file format, it would be convenient to be able to have a "magic number" in
+there, like most file formats have, to be able to do a sanity check to see if
+you're reading the kind of file you're expecting.
+
+Now, you can always prefix a FlatBuffer with your own file header, but
+FlatBuffers has a built-in way to add an identifier to a FlatBuffer that takes
+up minimal space, and keeps the buffer compatible with buffers that don't have
+such an identifier.
+
+You can specify in a schema, similar to `root_type`, that you intend for this
+type of FlatBuffer to be used as a file format:
+
+```txt
+file_identifier "MYFI";
+```
+
+Identifiers must always be exactly 4 characters long. These 4 characters will
+end up as bytes at offsets 4-7 (inclusive) in the buffer.
+
+For any schema that has such an identifier, `flatc` will automatically add the
+identifier to any binaries it generates (with `-b`), and generated calls like
+`FinishMonsterBuffer` also add the identifier. If you have specified an
+identifier and wish to generate a buffer without one, you can always still do so
+by calling `FlatBufferBuilder::Finish` explicitly.
+
+After loading a buffer, you can use a call like `MonsterBufferHasIdentifier` to
+check if the identifier is present.
+
+Note that this is best for open-ended uses such as files. If you simply wanted
+to send one of a set of possible messages over a network for example, you'd be
+better off with a union.
+
+Additionally, by default `flatc` will output binary files as `.bin`. This
+declaration in the schema will change that to whatever you want:
+
+```txt
+file_extension "ext";
+```
+
+## RPC interface declarations
+
+You can declare RPC calls in a schema, that define a set of functions that take
+a FlatBuffer as an argument (the request) and return a FlatBuffer as the
+response (both of which must be table types):
+
+```txt
+rpc_service MonsterStorage {
+    Store(Monster):StoreResponse;
+    Retrieve(MonsterId):Monster;
+}
+```
+
+What code this produces and how it is used depends on language and RPC system
+used, there is preliminary support for GRPC through the `--grpc` code generator,
+see `grpc/tests` for an example.
+
+## Comments & documentation
+
+May be written as in most C-based languages. Additionally, a triple comment
+(`///`) on a line by itself signals that a comment is documentation for whatever
+is declared on the line after it (table/struct/field/enum/union/element), and
+the comment is output in the corresponding C++ code. Multiple such lines per
+item are allowed.
+
+## Attributes
+
+Attributes may be attached to a declaration, behind a field/enum value, or after
+the name of a table/struct/enum/union. These may either have a value or not.
+Some attributes like `deprecated` are understood by the compiler; user defined
+ones need to be declared with the attribute declaration (like `priority` in the
+example above), and are available to query if you parse the schema at runtime.
+This is useful if you write your own code generators/editors etc., and you wish
+to add additional information specific to your tool (such as a help text).
+
+Current understood attributes:
+
+- `id: n` (on a table field): manually set the field identifier to `n`. If you
+  use this attribute, you must use it on ALL fields of this table, and the
+  numbers must be a contiguous range from 0 onwards. Additionally, since a union
+  type effectively adds two fields, its id must be that of the second field (the
+  first field is the type field and not explicitly declared in the schema). For
+  example, if the last field before the union field had id 6, the union field
+  should have id 8, and the unions type field will implicitly be 7. IDs allow
+  the fields to be placed in any order in the schema. When a new field is added
+  to the schema it must use the next available ID.
+- `deprecated` (on a field): do not generate accessors for this field anymore,
+  code should stop using this data. Old data may still contain this field, but
+  it won't be accessible anymore by newer code. Note that if you deprecate a
+  field that was previous required, old code may fail to validate new data (when
+  using the optional verifier).
+
+### `required`
+
+- `required` (on a non-scalar table field): this field must always be set. By
+  default, fields do not need to be present in the binary. This is desirable, as
+  it helps with forwards/backwards compatibility, and flexibility of data
+  structures. By specifying this attribute, you make non- presence in an error
+  for both reader and writer. The reading code may access the field directly,
+  without checking for null. If the constructing code does not initialize this
+  field, they will get an assert, and also the verifier will fail on buffers
+  that have missing required fields. Both adding and removing this attribute may
+  be forwards/backwards incompatible as readers will be unable read old or new
+  data, respectively, unless the data happens to always have the field set.
+- `force_align: size` (on a struct): force the alignment of this struct to be
+  something higher than what it is naturally aligned to. Causes these structs to
+  be aligned to that amount inside a buffer, IF that buffer is allocated with
+  that alignment (which is not necessarily the case for buffers accessed
+  directly inside a `FlatBufferBuilder`). Note: currently not guaranteed to have
+  an effect when used with `--object-api`, since that may allocate objects at
+  alignments less than what you specify with `force_align`.
+- `force_align: size` (on a vector): force the alignment of this vector to be
+  something different than what the element size would normally dictate. Note:
+  Now only work for generated C++ code.
+- `bit_flags` (on an unsigned enum): the values of this field indicate bits,
+  meaning that any unsigned value N specified in the schema will end up
+  representing 1<<N, or if you don't specify values at all, you'll get the
+  sequence 1, 2, 4, 8, ...
+- `nested_flatbuffer: "table_name"` (on a field): this indicates that the field
+  (which must be a vector of ubyte) contains flatbuffer data, for which the root
+  type is given by `table_name`. The generated code will then produce a
+  convenient accessor for the nested FlatBuffer.
+- `flexbuffer` (on a field): this indicates that the field (which must be a
+  vector of ubyte) contains flexbuffer data. The generated code will then
+  produce a convenient accessor for the FlexBuffer root.
+- `key` (on a field): this field is meant to be used as a key when sorting a
+  vector of the type of table it sits in. Can be used for in-place binary
+  search.
+- `hash` (on a field). This is an (un)signed 32/64 bit integer field, whose
+  value during JSON parsing is allowed to be a string, which will then be stored
+  as its hash. The value of attribute is the hashing algorithm to use, one of
+  `fnv1_32` `fnv1_64` `fnv1a_32` `fnv1a_64`.
+- `original_order` (on a table): since elements in a table do not need to be
+  stored in any particular order, they are often optimized for space by sorting
+  them to size. This attribute stops that from happening. There should generally
+  not be any reason to use this flag.
+- 'native*\*'. Several attributes have been added to support the [C++ object
+  Based API](@ref flatbuffers_cpp_object_based_api). All such attributes are
+  prefixed with the term "native*".
+
+## JSON Parsing
+
+The same parser that parses the schema declarations above is also able to parse
+JSON objects that conform to this schema. So, unlike other JSON parsers, this
+parser is strongly typed, and parses directly into a FlatBuffer (see the
+compiler documentation on how to do this from the command line, or the C++
+documentation on how to do this at runtime).
+
+Besides needing a schema, there are a few other changes to how it parses JSON:
+
+- It accepts field names with and without quotes, like many JSON parsers already
+  do. It outputs them without quotes as well, though can be made to output them
+  using the `strict_json` flag.
+- If a field has an enum type, the parser will recognize symbolic enum values
+  (with or without quotes) instead of numbers, e.g. `field: EnumVal`. If a field
+  is of integral type, you can still use symbolic names, but values need to be
+  prefixed with their type and need to be quoted, e.g. `field: "Enum.EnumVal"`.
+  For enums representing flags, you may place multiple inside a string separated
+  by spaces to OR them, e.g. `field: "EnumVal1 EnumVal2"` or
+  `field: "Enum.EnumVal1 Enum.EnumVal2"`.
+- Similarly, for unions, these need to specified with two fields much like you
+  do when serializing from code. E.g. for a field `foo`, you must add a field
+  `foo_type: FooOne` right before the `foo` field, where `FooOne` would be the
+  table out of the union you want to use.
+- A field that has the value `null` (e.g. `field: null`) is intended to have the
+  default value for that field (thus has the same effect as if that field wasn't
+  specified at all).
+- It has some built in conversion functions, so you can write for example
+  `rad(180)` where ever you'd normally write `3.14159`. Currently supports the
+  following functions: `rad`, `deg`, `cos`, `sin`, `tan`, `acos`, `asin`,
+  `atan`.
+
+When parsing JSON, it recognizes the following escape codes in strings:
+
+- `\n` - linefeed.
+- `\t` - tab.
+- `\r` - carriage return.
+- `\b` - backspace.
+- `\f` - form feed.
+- `\"` - double quote.
+- `\\` - backslash.
+- `\/` - forward slash.
+- `\uXXXX` - 16-bit unicode code point, converted to the equivalent UTF-8
+  representation.
+- `\xXX` - 8-bit binary hexadecimal number XX. This is the only one that is not
+  in the JSON spec (see http://json.org/), but is needed to be able to encode
+  arbitrary binary in strings to text and back without losing information (e.g.
+  the byte 0xFF can't be represented in standard JSON).
+
+It also generates these escape codes back again when generating JSON from a
+binary representation.
+
+When parsing numbers, the parser is more flexible than JSON. A format of numeric
+literals is more close to the C/C++. According to the
+[grammar](/overview/grammar), it accepts the following numerical literals:
+
+- An integer literal can have any number of leading zero `0` digits. Unlike
+  C/C++, the parser ignores a leading zero, not interpreting it as the beginning
+  of the octal number. The numbers `[081, -00094]` are equal to `[81, -94]`
+  decimal integers.
+- The parser accepts unsigned and signed hexadecimal integer numbers. For
+  example: `[0x123, +0x45, -0x67]` are equal to `[291, 69, -103]` decimals.
+- The format of float-point numbers is fully compatible with C/C++ format. If a
+  modern C++ compiler is used the parser accepts hexadecimal and special
+  floating-point literals as well:
+  `[-1.0, 2., .3e0, 3.e4, 0x21.34p-5, -inf, nan]`.
+
+  The following conventions for floating-point numbers are used:
+
+  - The exponent suffix of hexadecimal floating-point number is mandatory.
+  - Parsed `NaN` converted to unsigned IEEE-754 `quiet-NaN` value.
+
+  Extended floating-point support was tested with:
+
+  - x64 Windows: `MSVC2015` and higher.
+  - x64 Linux: `LLVM 6.0`, `GCC 4.9` and higher.
+
+  For details, see [Use in C++](@ref flatbuffers_guide_use_cpp) section.
+
+- For compatibility with a JSON lint tool all numeric literals of scalar fields
+  can be wrapped to quoted string:
+  `"1", "2.0", "0x48A", "0x0C.0Ep-1", "-inf", "true"`.
+
+## Guidelines
+
+### Efficiency
+
+FlatBuffers is all about efficiency, but to realize that efficiency you require
+an efficient schema. There are usually multiple choices on how to represent data
+that have vastly different size characteristics.
+
+It is very common nowadays to represent any kind of data as dictionaries (as in
+e.g. JSON), because of its flexibility and extensibility. While it is possible
+to emulate this in FlatBuffers (as a vector of tables with key and value(s)),
+this is a bad match for a strongly typed system like FlatBuffers, leading to
+relatively large binaries. FlatBuffer tables are more flexible than
+classes/structs in most systems, since having a large number of fields only few
+of which are actually used is still efficient. You should thus try to organize
+your data as much as possible such that you can use tables where you might be
+tempted to use a dictionary.
+
+Similarly, strings as values should only be used when they are truly open-ended.
+If you can, always use an enum instead.
+
+FlatBuffers doesn't have inheritance, so the way to represent a set of related
+data structures is a union. Unions do have a cost however, so an alternative to
+a union is to have a single table that has all the fields of all the data
+structures you are trying to represent, if they are relatively similar / share
+many fields. Again, this is efficient because non-present fields are cheap.
+
+FlatBuffers supports the full range of integer sizes, so try to pick the
+smallest size needed, rather than defaulting to int/long.
+
+Remember that you can share data (refer to the same string/table within a
+buffer), so factoring out repeating data into its own data structure may be
+worth it.
+
+### Style guide
+
+Identifiers in a schema are meant to translate to many different programming
+languages, so using the style of your "main" language is generally a bad idea.
+
+For this reason, below is a suggested style guide to adhere to, to keep schemas
+consistent for interoperation regardless of the target language.
+
+Where possible, the code generators for specific languages will generate
+identifiers that adhere to the language style, based on the schema identifiers.
+
+- Table, struct, enum and rpc names (types): UpperCamelCase.
+- Table and struct field names: snake_case. This is translated to lowerCamelCase
+  automatically for some languages, e.g. Java.
+- Enum values: UpperCamelCase.
+- namespaces: UpperCamelCase.
+
+Formatting (this is less important, but still worth adhering to):
+
+- Opening brace: on the same line as the start of the declaration.
+- Spacing: Indent by 2 spaces. None around `:` for types, on both sides for `=`.
+
+For an example, see the schema at the top of this file.
+
+## Gotchas
+
+
+
+### Testing whether a field is present in a table
+
+Most serialization formats (e.g. JSON or Protocol Buffers) make it very explicit
+in the format whether a field is present in an object or not, allowing you to
+use this as "extra" information.
+
+FlatBuffers will not write fields that are equal to their default value,
+sometimes resulting in significant space savings. However, this also means we
+cannot disambiguate the meaning of non-presence as "written default value" or
+"not written at all". This only applies to scalar fields since only they support
+default values. Unless otherwise specified, their default is 0.
+
+If you care about the presence of scalars, most languages support "optional
+scalars." You can set `null` as the default value in the schema. `null` is a
+value that's outside of all types, so we will always write if `add_field` is
+called. The generated field accessor should use the local language's canonical
+optional type.
+
+Some `FlatBufferBuilder` implementations have an option called `force_defaults`
+that circumvents this "not writing defaults" behavior you can then use
+`IsFieldPresent` to query presence. / Another option that works in all languages
+is to wrap a scalar field in a struct. This way it will return null if it is not
+present. This will be slightly less ergonomic but structs don't take up any more
+space than the scalar they represent.
+