forked from Research/WhisperCom
297 lines
9.4 KiB
Markdown
297 lines
9.4 KiB
Markdown
|
# Binary Values
|
||
|
|
||
|
The library implements several [binary formats](binary_formats/index.md) that encode JSON in an efficient way. Most of these formats support binary values; that is, values that have semantics define outside the library and only define a sequence of bytes to be stored.
|
||
|
|
||
|
JSON itself does not have a binary value. As such, binary values are an extension that this library implements to store values received by a binary format. Binary values are never created by the JSON parser, and are only part of a serialized JSON text if they have been created manually or via a binary format.
|
||
|
|
||
|
## API for binary values
|
||
|
|
||
|
```plantuml
|
||
|
class json::binary_t {
|
||
|
-- setters --
|
||
|
+void set_subtype(std::uint64_t subtype)
|
||
|
+void clear_subtype()
|
||
|
-- getters --
|
||
|
+std::uint64_t subtype() const
|
||
|
+bool has_subtype() const
|
||
|
}
|
||
|
|
||
|
"std::vector<uint8_t>" <|-- json::binary_t
|
||
|
```
|
||
|
|
||
|
By default, binary values are stored as `std::vector<std::uint8_t>`. This type can be changed by providing a template parameter to the `basic_json` type. To store binary subtypes, the storage type is extended and exposed as `json::binary_t`:
|
||
|
|
||
|
```cpp
|
||
|
auto binary = json::binary_t({0xCA, 0xFE, 0xBA, 0xBE});
|
||
|
auto binary_with_subtype = json::binary_t({0xCA, 0xFE, 0xBA, 0xBE}, 42);
|
||
|
```
|
||
|
|
||
|
There are several convenience functions to check and set the subtype:
|
||
|
|
||
|
```cpp
|
||
|
binary.has_subtype(); // returns false
|
||
|
binary_with_subtype.has_subtype(); // returns true
|
||
|
|
||
|
binary_with_subtype.clear_subtype();
|
||
|
binary_with_subtype.has_subtype(); // returns true
|
||
|
|
||
|
binary_with_subtype.set_subtype(42);
|
||
|
binary.set_subtype(23);
|
||
|
|
||
|
binary.subtype(); // returns 23
|
||
|
```
|
||
|
|
||
|
As `json::binary_t` is subclassing `std::vector<std::uint8_t>`, all member functions are available:
|
||
|
|
||
|
```cpp
|
||
|
binary.size(); // returns 4
|
||
|
binary[1]; // returns 0xFE
|
||
|
```
|
||
|
|
||
|
JSON values can be constructed from `json::binary_t`:
|
||
|
|
||
|
```cpp
|
||
|
json j = binary;
|
||
|
```
|
||
|
|
||
|
Binary values are primitive values just like numbers or strings:
|
||
|
|
||
|
```cpp
|
||
|
j.is_binary(); // returns true
|
||
|
j.is_primitive(); // returns true
|
||
|
```
|
||
|
|
||
|
Given a binary JSON value, the `binary_t` can be accessed by reference as via `get_binary()`:
|
||
|
|
||
|
```cpp
|
||
|
j.get_binary().has_subtype(); // returns true
|
||
|
j.get_binary().size(); // returns 4
|
||
|
```
|
||
|
|
||
|
For convenience, binary JSON values can be constructed via `json::binary`:
|
||
|
|
||
|
```cpp
|
||
|
auto j2 = json::binary({0xCA, 0xFE, 0xBA, 0xBE}, 23);
|
||
|
auto j3 = json::binary({0xCA, 0xFE, 0xBA, 0xBE});
|
||
|
|
||
|
j2 == j; // returns true
|
||
|
j3.get_binary().has_subtype(); // returns false
|
||
|
j3.get_binary().subtype(); // returns std::uint64_t(-1) as j3 has no subtype
|
||
|
```
|
||
|
|
||
|
|
||
|
|
||
|
## Serialization
|
||
|
|
||
|
Binary values are serialized differently according to the formats.
|
||
|
|
||
|
### JSON
|
||
|
|
||
|
JSON does not have a binary type, and this library does not introduce a new type as this would break conformance. Instead, binary values are serialized as an object with two keys: `bytes` holds an array of integers, and `subtype` is an integer or `null`.
|
||
|
|
||
|
??? example
|
||
|
|
||
|
Code:
|
||
|
|
||
|
```cpp
|
||
|
// create a binary value of subtype 42
|
||
|
json j;
|
||
|
j["binary"] = json::binary({0xCA, 0xFE, 0xBA, 0xBE}, 42);
|
||
|
|
||
|
// serialize to standard output
|
||
|
std::cout << j.dump(2) << std::endl;
|
||
|
```
|
||
|
|
||
|
Output:
|
||
|
|
||
|
```json
|
||
|
{
|
||
|
"binary": {
|
||
|
"bytes": [202, 254, 186, 190],
|
||
|
"subtype": 42
|
||
|
}
|
||
|
}
|
||
|
```
|
||
|
|
||
|
!!! warning "No roundtrip for binary values"
|
||
|
|
||
|
The JSON parser will not parse the objects generated by binary values back to binary values. This is by design to remain standards compliant. Serializing binary values to JSON is only implemented for debugging purposes.
|
||
|
|
||
|
### BSON
|
||
|
|
||
|
[BSON](binary_formats/bson.md) supports binary values and subtypes. If a subtype is given, it is used and added as unsigned 8-bit integer. If no subtype is given, the generic binary subtype 0x00 is used.
|
||
|
|
||
|
??? example
|
||
|
|
||
|
Code:
|
||
|
|
||
|
```cpp
|
||
|
// create a binary value of subtype 42
|
||
|
json j;
|
||
|
j["binary"] = json::binary({0xCA, 0xFE, 0xBA, 0xBE}, 42);
|
||
|
|
||
|
// convert to BSON
|
||
|
auto v = json::to_bson(j);
|
||
|
```
|
||
|
|
||
|
`v` is a `std::vector<std::uint8t>` with the following 22 elements:
|
||
|
|
||
|
```c
|
||
|
0x16 0x00 0x00 0x00 // number of bytes in the document
|
||
|
0x05 // binary value
|
||
|
0x62 0x69 0x6E 0x61 0x72 0x79 0x00 // key "binary" + null byte
|
||
|
0x04 0x00 0x00 0x00 // number of bytes
|
||
|
0x2a // subtype
|
||
|
0xCA 0xFE 0xBA 0xBE // content
|
||
|
0x00 // end of the document
|
||
|
```
|
||
|
|
||
|
Note that the serialization preserves the subtype, and deserializing `v` would yield the following value:
|
||
|
|
||
|
```json
|
||
|
{
|
||
|
"binary": {
|
||
|
"bytes": [202, 254, 186, 190],
|
||
|
"subtype": 42
|
||
|
}
|
||
|
}
|
||
|
```
|
||
|
|
||
|
### CBOR
|
||
|
|
||
|
[CBOR](binary_formats/cbor.md) supports binary values, but no subtypes. Subtypes will be serialized as tags. Any binary value will be serialized as byte strings. The library will choose the smallest representation using the length of the byte array.
|
||
|
|
||
|
??? example
|
||
|
|
||
|
Code:
|
||
|
|
||
|
```cpp
|
||
|
// create a binary value of subtype 42
|
||
|
json j;
|
||
|
j["binary"] = json::binary({0xCA, 0xFE, 0xBA, 0xBE}, 42);
|
||
|
|
||
|
// convert to CBOR
|
||
|
auto v = json::to_cbor(j);
|
||
|
```
|
||
|
|
||
|
`v` is a `std::vector<std::uint8t>` with the following 15 elements:
|
||
|
|
||
|
```c
|
||
|
0xA1 // map(1)
|
||
|
0x66 // text(6)
|
||
|
0x62 0x69 0x6E 0x61 0x72 0x79 // "binary"
|
||
|
0xD8 0x2A // tag(42)
|
||
|
0x44 // bytes(4)
|
||
|
0xCA 0xFE 0xBA 0xBE // content
|
||
|
```
|
||
|
|
||
|
Note that the subtype is serialized as tag. However, parsing tagged values yield a parse error unless `json::cbor_tag_handler_t::ignore` or `json::cbor_tag_handler_t::store` is passed to `json::from_cbor`.
|
||
|
|
||
|
```json
|
||
|
{
|
||
|
"binary": {
|
||
|
"bytes": [202, 254, 186, 190],
|
||
|
"subtype": null
|
||
|
}
|
||
|
}
|
||
|
```
|
||
|
|
||
|
### MessagePack
|
||
|
|
||
|
[MessagePack](binary_formats/messagepack.md) supports binary values and subtypes. If a subtype is given, the ext family is used. The library will choose the smallest representation among fixext1, fixext2, fixext4, fixext8, ext8, ext16, and ext32. The subtype is then added as singed 8-bit integer.
|
||
|
|
||
|
If no subtype is given, the bin family (bin8, bin16, bin32) is used.
|
||
|
|
||
|
??? example
|
||
|
|
||
|
Code:
|
||
|
|
||
|
```cpp
|
||
|
// create a binary value of subtype 42
|
||
|
json j;
|
||
|
j["binary"] = json::binary({0xCA, 0xFE, 0xBA, 0xBE}, 42);
|
||
|
|
||
|
// convert to MessagePack
|
||
|
auto v = json::to_msgpack(j);
|
||
|
```
|
||
|
|
||
|
`v` is a `std::vector<std::uint8t>` with the following 14 elements:
|
||
|
|
||
|
```c
|
||
|
0x81 // fixmap1
|
||
|
0xA6 // fixstr6
|
||
|
0x62 0x69 0x6E 0x61 0x72 0x79 // "binary"
|
||
|
0xD6 // fixext4
|
||
|
0x2A // subtype
|
||
|
0xCA 0xFE 0xBA 0xBE // content
|
||
|
```
|
||
|
|
||
|
Note that the serialization preserves the subtype, and deserializing `v` would yield the following value:
|
||
|
|
||
|
```json
|
||
|
{
|
||
|
"binary": {
|
||
|
"bytes": [202, 254, 186, 190],
|
||
|
"subtype": 42
|
||
|
}
|
||
|
}
|
||
|
```
|
||
|
|
||
|
### UBJSON
|
||
|
|
||
|
[UBJSON](binary_formats/ubjson.md) neither supports binary values nor subtypes, and proposes to serialize binary values as array of uint8 values. This translation is implemented by the library.
|
||
|
|
||
|
??? example
|
||
|
|
||
|
Code:
|
||
|
|
||
|
```cpp
|
||
|
// create a binary value of subtype 42 (will be ignored in UBJSON)
|
||
|
json j;
|
||
|
j["binary"] = json::binary({0xCA, 0xFE, 0xBA, 0xBE}, 42);
|
||
|
|
||
|
// convert to UBJSON
|
||
|
auto v = json::to_msgpack(j);
|
||
|
```
|
||
|
|
||
|
`v` is a `std::vector<std::uint8t>` with the following 20 elements:
|
||
|
|
||
|
```c
|
||
|
0x7B // '{'
|
||
|
0x69 0x06 // i 6 (length of the key)
|
||
|
0x62 0x69 0x6E 0x61 0x72 0x79 // "binary"
|
||
|
0x5B // '['
|
||
|
0x55 0xCA 0x55 0xFE 0x55 0xBA 0x55 0xBE // content (each byte prefixed with 'U')
|
||
|
0x5D // ']'
|
||
|
0x7D // '}'
|
||
|
```
|
||
|
|
||
|
The following code uses the type and size optimization for UBJSON:
|
||
|
|
||
|
```cpp
|
||
|
// convert to UBJSON using the size and type optimization
|
||
|
auto v = json::to_ubjson(j, true, true);
|
||
|
```
|
||
|
|
||
|
The resulting vector has 23 elements; the optimization is not effective for examples with few values:
|
||
|
|
||
|
```c
|
||
|
0x7B // '{'
|
||
|
0x24 // '$' type of the object elements
|
||
|
0x5B // '[' array
|
||
|
0x23 0x69 0x01 // '#' i 1 number of object elements
|
||
|
0x69 0x06 // i 6 (length of the key)
|
||
|
0x62 0x69 0x6E 0x61 0x72 0x79 // "binary"
|
||
|
0x24 0x55 // '$' 'U' type of the array elements: unsinged integers
|
||
|
0x23 0x69 0x04 // '#' i 4 number of array elements
|
||
|
0xCA 0xFE 0xBA 0xBE // content
|
||
|
```
|
||
|
|
||
|
Note that subtype (42) is **not** serialized and that UBJSON has **no binary type**, and deserializing `v` would yield the following value:
|
||
|
|
||
|
```json
|
||
|
{
|
||
|
"binary": [202, 254, 186, 190]
|
||
|
}
|
||
|
```
|