EBML, Extensible Binary Meta-Language (o Extensible Binary Meta Language) es un nuevo lenguage de marcado de información conceptualmente semejante al XML pero diseñado para ser más compacto y codificar datos binarios.
Muchos lo ocnsideran una variante del concepto de «reinventar la rueda» puesto que esta necesidad está tradicionalmente cubierta con la notación ASN.1 Abstract Syntax Notation One (notación sintáctica abstracta 1).
Sin embargo es interesante conocerla, por lo que se incluye seguidamente un borrador de la especcificación:
RFC v1.0
Draft M. Nilsson
Document: ebml-1.0.txt 15th March 2004
Extensible Binary Markup Language
Copyright
Copyright (C) Martin Nilsson 2004. All rights reserved.
Abstract
The extensible binary markup language, EBML, is a binary language
for storing hierarchical, typed in data in a compact, yet easily
parsed format.
About this document
This document is currently in its draft stage and is subject to
changes. The contents should not be relied upon for
implementational purposes.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
this document are to be interpreted as described in RFC 2119
[RFC2119].
The definitions in this document uses ABNF [ABNF]. Note that string
terminals are case insensitive in ABNF. The interpretation of
binary values has been slightly altered so that every bit must be
explicitly printed, e.g. %b0 is one zero bit while %b000 represents
three zero bits. The following (re)definitions are used throughout
this document.
BIT = %b1 / %b0
BYTE = OCTET
WSP = SP / HTAB / CR / LF
1. Introduction
The Extensible Binary Markup Language EBML was designed to be a
simplified binary version of XML for the purpose of storing and
manipulating data in a hierarchical form with variable field
lengths. Specifically EBML was designed as the framework language
for the video container format Matroska. Some of the advantages of
EBML are:
- Possibility for compatibility between different versions of
binary languages defined in EBML. A rare property of binary
format that otherwise often needs careful consideration
beforehand.
- Unlimited size of data payload.
- Can be both generated and read as a stream, without knowing the
data size beforehand.
- Often very space efficient, even compared to other binary
representations of the same data.
Some of the EBML disadvantages are:
- No references can be made between EBML files, such as includes or
inherits. Every EBML document is a self contained entity. The
data stored in EBML may of course reference other resources.
- No compositioning process to merge two or more EBML languages
currently exists.
This document describes the EBML binary syntax, its semantic
interpretation and the syntax of a textual document type definition
format used to define the rules and constraints of an EBML
language. BNF is used throughout this document to augment the
description and make it more formalized. It must however be noted
that two different formats are described in this document, with two
different BNFs. One for the binary format in chapter 2 and one for
the document type definition in the rest of the document. To avoid
confusion different token names has been chosen. The BNFs can be
viewed in full in the appendices.
2. Structure
Syntactically an EBML document is a list of EBML elements. Each
element has three parts; an element ID, a size descriptor and the
element data payload. The element data payload is then either data,
implicitly typed by the element ID, or a list of EBML elements.
EBML = *ELEMENT
ELEMENT = ELEMENT_ID SIZE DATA
DATA = VALUE / *ELEMENT
ELEMENT_ID = VINT
SIZE = VINT
EBML uses big endian/network order byte order, i.e. most
significant bit first. All of the tokens above are byte aligned.
2.1. Variable size integer
For both element ID and size descriptor EBML uses a variable size
integer, coded according to a schema similar to that of UTF-8
[UTF-8] encoding. The variable size integer begins with zero or
more zero bits to define the width of the integer. Zero zeroes
means a width of one byte, one zero a width of two bytes etc. The
zeroes are followed by a marker of one set bit and then follows the
actual integer data. The integer data consists of alignment data
and tail data. The alignment data together with the width
descriptor and the marker makes up one ore more complete bytes. The
tail data is as many bytes as there were zeroes in the width
descriptor, i.e. width-1.
VINT = VINT_WIDTH VINT_MARKER VINT_DATA
VINT_WIDTH = *%b0
VINT_MARKER = %b1
VINT_DATA = VINT_ALIGNMENT VINT_TAIL
VINT_ALIGNMENT = *BIT
VINT_TAIL = *BYTE
An alternate way of expressing this is the following definition,
where the width is the number of levels of expansion.
VINT = ( %b0 VINT 7BIT ) / ( %b1 7BIT )
Some examples of the encoding of integers of width 1 to 4. The x:es
represent bits where the actual integer value would be stored.
Width Size Representation
1 2^7 1xxx xxxx
2 2^14 01xx xxxx xxxx xxxx
3 2^21 001x xxxx xxxx xxxx xxxx xxxx
4 2^28 0001 xxxx xxxx xxxx xxxx xxxx xxxx xxxx
2.2. Element ID
The EBML element ID is encoded as a variable size integer with, by
default, widths up to 4. Another maximum width value can be set by
setting another value to EBMLMaxIDWidth in the EBML header. See
section 5.1. IDs are always encoded in their shortest form, e.g. 1
is always encoded as 0x81 and never as 0x4001. This limits the
number of IDs in every class with the number of IDs in the previous
classes. Furthermore, values with all data bits set to 1 and all
data bits set to 0 are reserved, hence the effective number of IDs
are as follows for different widths. Note that the shortest
encoding form for 127 is 0x407f since 0x7f is reserved.
Class Width Number of IDs
A 1 2^7-2 = 126
B 2 2^14-2^7 = 16 256
C 3 2^21-2^14 = 2 080 768
D 4 2^28-2^21 = 266 338 304
2.3. Element data size
The EBML element data size is encoded as a variable size integer
with, by default, widths up to 8. Another maximum width value can
be set by setting another value to EBMLMaxSizeWidth in the EBML
header. See section 5.1. There is a range overlap between all
different widths, so that 1 encoded with width 1 is semantically
equal to 1 encoded with width 8. This allows for the element data
to shrink without having to shrink the width of the size
descriptor.
Values with all data bits set to 1 means size unknown, which allows
for dynamically generated EBML streams where the final size isn't
known beforehand. The element with unknown size MUST be an element
with an element list as data payload. The end of the element list is
determined by the ID of the element. When an element that isn't a
sub-element of the element with unknown size arrives, the element
list is ended.
Since the highest value is used for unknown size the effective
maximum data size is 2^56-2, using variable size integer width 8.
2.4. Values
Besides having an element list as data payload an element can have
its data typed with any of seven predefined data types. The actual
type information isn't stored in EBML but is inferred from the
document type definition through the element ID. The defined data
types are signed integer, unsigned integer, float, ASCII string,
UTF-8 string, date and binary data.
VALUE = INT / UINT / FLOAT / STRING / DATE / BINARY
INT = *8BYTE
Signed integer, represented in two's complement notation, sizes
from 0-8 bytes. A zero byte integer represents the integer value 0.
UINT = *8BYTE
Unsigned integer, sizes from 0-8 bytes. A zero byte integer
represents the integer value 0.
FLOAT = *1( 4BYTE / 8BYTE / 10BYTE )
IEEE float [FLOAT], sizes 0, 4, 8 or 10 bytes. A zero byte float
represents the float value 0.0.
PADDING = %x00
STRING = *BYTE *PADDING
UTF-8 [UTF-8] encoded Unicode [UNICODE] string. A string MAY be
zero padded at the end. Note that a string can be of zero length.
DATE = 8BYTE
Signed, 64-bit (8 byte) integer describing the distance in
nanoseconds to the beginning of the millennium (2001-01-01 00:00:00
UTC).
BINARY = *BYTE
Binary data, i.e. not interpreted at the EBML level.
3. Semantic interpretation
Every element has several properties defined in the document type
definition, which are needed for the correct syntactical and
semantic handling of the information. These properties are name,
parent, ID, cardinality, value restrictions, default value, value
type and child order.
To syntactically parse EBML data we need to know the element value
types, and to semantically interpret that data we also need to know
the element IDs and element names. Elements can have a default
value, so for the final presentation of the parsed EBML data
elements that wasn't stored in the data may show up. Finally
elements may have restrictions in terms of which parent or parents
they may have, the number of times they may occur in the EBML data,
their order in the document and various additional restrictions of
their data payload.
3.1. Name property
The name is the symbolic identifier of an element and has a 1:1
mapping to the element ID. Only alphanumeric characters and
underscores may be used for the name. It may not start with a
number. Names are treated case insensitive, i.e. "Name" is the same
identifier as "name".
NAME = [A-Za-z_] 1*[A-Za-z_0-9]
3.2. Value type property
There is no way of knowing whether or not to look for sub elements
with only the information presented in the EBML data. Hence the
element value type is the most important information in the EBML
DTD. In addition to the value types defined in section 2.4 an
element can also be of the type "container", which simply means
that its content is more elements.
3.3. ID property
Every element must have an ID associated, as defined in section
2.2. These IDs are expressed in the hexadecimal representation of
their encoded form, e.g. 1a45dfa3.
ID = 1*( 2HEXDIG )
3.4. Default value property
Every non-container MAY be assigned a default value. If so, its
value will be added to the interpretation of the EBML data if no
element with another value exists in the data.
As an example, consider this EBML DTD:
Weight := 4101 {
WeightValue := 41a1 uint;
WeightUnit := 41a2 string [ def:"kilogram" ];
}
If the Weight element only contains the WeightValue element, the
WeightUnit element with value "kilogram" will be added when the
information is semantically processed. A WeightUnit element with
another value would of course override the default.
The default value can also be a symbol referring back to a
previously seen symbol. If however no such symbol has been seen,
i.e. it has not been encoded into the EBML data and has no default
value, the element will not be added as a child on the semantic
level.
Weight := 4101 {
WeightValue := 41a1 uint;
WeightUnit := 41a2 string [ def:WeightUnit ];
}
In this example all Weight elements will use the same weight unit
as the previous one. To ensure that the first one has a value its
cardinality should be set to "1". See section 3.6.
DEFAULT = INT_DEF / UINT_DEF / FLOAT_DEF / STRING_DEF /
DATE_DEF / BINARY_DEF / NAME
DATE_VALUE = *1DIGIT 2DIGIT 2DIGIT
*1(%x54 2DIGIT ":" 2DIGIT ":" 2DIGIT
*1( "." *1DIGIT ))
INT_DEF = *1"-" 1*DIGIT
UINT_DEF = 1*DIGIT
FLOAT_DEF = INT "." 1*DIGIT *1( "e" *1( "+"/"-" ) 1*DIGIT )
DATE_DEF = INT_DEF / DATE_VALUE
STRING_DEF = ("0x" 1*( 2HEXDIG )) / ( %x22 *(%x20-7e) %x22 )
BINARY_DEF = STRING_DEF
The date default value is either described as the integer in its
encoded form or in the ISO short format; YYYYMMDD followed by the
string literal T, the time as HH:MM:DD and finally and optionally
the fractions as .F with an optional numbers of F for
precision. Some examples:
ExampleInt := c0 int [ def:-114; ]
ExampleUInt := c1 uint [ def:0; ]
ExampleFloat := c2 float [ def:6.022E23 ]
ExampleDate := c3 date [ def:20011224T15:00:03.21; ]
ExampleBinary := c5 binary [ def:0x4944337632; ]
ExampleString := c6 string [ def:"Sweden"; ]
3.5. Parent property
To place the elements in a hierarchical structure we need
relational information about the elements. In the EBML DTD this is
expressed as the possible parents an element may have. This can be
expressed in two ways: an explicit list of allowed parents or a
generic definition of allowed insertion depth.
PARENTS = NAME / ( NAME "," PARENTS )
LEVEL = 1*DIGIT *1( ".." *DIGIT )
An element with neither parents nor level defined is assumed to
exist on the top level in the EBML document. An element can not
have both a parent and a level property.
The following example contains two elements, Envelope and
Letter. The Letter must, if it exists in the document, be a child
of Envelope and Envelop must, if it exists in the document, reside
at the top level.
Envelope := a0 container;
Letter := b0 string [ parent:Envelope; ]
The following example expresses the exact same relationships, but
in an abbreviated syntax.
Envelope := a0 container {
Letter := b0 string;
}
The following example shows that the abbreviated syntax can't be
used to solve all relations. Here the Letter element can be a child
in both the Envelope element and the Trashcan element. The explicit
parent information is merged with the one expressed with the
abbreviated syntax.
Envelope := a0 container {
Letter := b0 string [ parent:Trashcan; ];
}
Trashcan := a1 container;
The following example demonstrates the usage of level instead of
parent. The element Void may occur at any place in the EBML
document.
Void := ec binary [ level:1..; card:*; ]
This is an example similar to the one above, but using containers
instead. The children of the parent to the SHA1 element may be
pushed down to the level where "%children;" is, i.e. if used with
the first Letter-Envelope example the Envelope may contain an SHA1
element which may contain an SHA1Content element which may contain
a Letter element. A container element with a level property MUST
NOT use undefined size as described in section 2.3, since then the
end of the element can not be determined in all cases.
SHA1 := 190babe5 container [ level:1..; card:*; ] {
SHA1Content := 20110f container {
%children;
}
SHA1Hash := 20110e binary;
}
3.6. Cardinality property
The cardinality of an element declares how many times an element
may occur in its current scope. By default an element may only
occur at most once in a scope, e.g. if the element Weight has been
defined as a child to Brick, no more than one Weight element can be
used in every Brick element. The cardinality can however be altered
from the default to any of the following. Note that this affects
all scopes that the element can be inserted into.
Symbol Interpretation
* Zero or more occurrences.
? Zero or one occurrence (default).
1 Exactly one occurrence.
+ One or more occurrences.
CARDINALITY = "*" / "?" / "1" / "+"
3.7. Child order property
The child order property only applies to container elements. It
simply declares if the children of the element must appear in the
order they are defined in or not. By default this restriction is
imposed on all children to all elements. The advantage of ordered
elements in combination with default values is that the EBML
decoder will know immediately once an element is skipped and can
output the appropriate default value instead.
YES = "yes" / "1"
NO = "no" / "0"
ORDERED = YES / NO
3.8. Value restriction properties
Every element may impose additional constraints upon its
value. These constraints are only used to validate data during
encoding and makes no difference for the decoding or interpretation
of the encoded data. The available constraints and syntax differs
between different types of elements.
3.8.1. Range
The range of an element value determines the allowed values that
the element value may have. The range is expressed as one or more
specific values or spans of allowed values, with the exception of
floats and dates where only spans are allowed.
RANGE_LIST = RANGE_ITEM / ( RANGE_ITEM S "," S RANGE_LIST )
RANGE_ITEM = INT_RANGE / UINT_RANGE / FLOAT_RANGE /
STRING_RANGE / DATE_RANGE / BINARY_RANGE
For integers the range is a list of values and open or closed
ranges, e.g. "0,1", "1..5" and "..-1,1..". The final allowed range
is a union of all listed ranges. If a value matches any of the
listed ranges it is considered valid.
INT_RANGE = INT_DEF / ( INT_DEF ".." ) / ( ".." INT_DEF ) /
( INT_DEF ".." INT_DEF )
Unsigned integers is similar, but can not have its range open to
the left.
UINT_RANGE = UINT_DEF *1( ".." UINT_DEF )
Floats can have both inclusive and exclusive end points of its
ranges.
FLOAT_RANGE = ( ("<" / "<="/ ">" / ">=") FLOAT_DEF ) /
( FLOAT_DEF "<"/"<=" ".." "<"/"<=" FLOAT_DEF )
Date ranges has the same syntax and semantics as integer ranges,
except that the actual date can be described either as an integer
or in ISO short format.
DATE_RANGE = ( DATE_DEF ".." ) / ( ".." DATE_DEF ) /
( DATE_DEF ".." DATE_DEF )
String and binary ranges limits the range for each character/byte
in the string. Note that in the string case this is for the
unencoded unicode data, i.e. the possible range is larger than
0-255, which is the bounding range for binary data.
STRING_RANGE = UINT_RANGE
BINARY_RANGE = UINT_RANGE
3.8.2. Size
The size of an element value is simply the number of bytes it
occupies in its encoded form. That means that the size of a string
element value need not be the same as the string length.
SIZE_LIST = UINT_RANGE / ( UINT_RANGE S "," S SIZE_LIST )
4. Document Type Definition
The EBML document type definition, EDTD, is an ASCII based language
that allows for the constraints and relations described in chapter
3 to be described in a way that is both human and computer
readable. Syntactically it consists of blocks, not unlike most
programming languages, which contains definitions/declarations
similar to BNF. The format is largely whitespace insensitive, case
insensitive and supports both C-style (LCOMMENT) and C++-style
(BCOMMENT) comments. The BNF lines in this chapter is somewhat
simplified for readability compared to the complete BNF in Appendix
B.
COMMENT = LCOMMENT / BCOMMENT
S = *WSP / ( *WSP COMMENT *WSP ) ; Optional white spaces
On the top level of the EDTD only three different blocks are
currently defined, header declarations, type definitions and
element definitions.
DTD = *( S / HBLOCK / TBLOCK / EBLOCK )
4.1. Header declarations
The meaning of the header declaration is to declare what values
should be put into the elements of the header, should they differ
from the default values. The header declaration block is written as
"declare header" followed by curly brackets that encloses the block
of statements. The actual statements are very straightforward, the
element name followed by ":=" followed by the value, as described
in section 3.4, followed by ";". There MUST only be one header
declaration block in a DTD.
HBLOCK = "declare" WSP "header" S "{" *(S / STATEMENT) "}"
STATEMENT = NAME S ":=" S DEFS S ";"
Since the DocType element has no default value it MUST be declared
in the EDTD. It is RECOMMENDED that the EBMLVersion is also
declared. Such a declaration could look like this:
declare header {
DocType := "xhtml";
EBMLVersion := 1;
}
4.2. Type definitions
Type definitions is a way to create types with more mnemonic names,
making the DTD both smaller and easier to read. The type
definitions block is written as "define types" followed by a block
of statements enclosed in curly brackets. Each statement is a type
name followed by ":=" followed by the type on which this type
should be based on, optionally followed by a list of properties,
enclosed in angle brackets. The type names follows the same rules
as element names, but lives in another namespace, i.e. there may be
an element with the same name as a type.
TBLOCK = "define" S WSP "types" S "{" *(S / DTYPE) "}"
DTYPE = NAME S ":=" S TYPE S (PROPERTIES S *1";")/";"
The base type may be another defined type, as long as the
definitions are made in order, as shown in the following example.
define types {
digits := int;
number := digits;
}
The type definition then both allows for types as described in
section 2.4 and NAME, which refers to types previously defined in
the document.
TYPE = VTYPE / CTYPE
VTYPE = "int" / "uint" / "float" / "string" / "date" /
"binary" / NAME
CTYPE = "conainer" / NAME
If the type definition has no list of properties, the statement is
ended with ";". If the it has a property list, ending the statement
with ";" is optional.
PROPERTIES = "[" S 1*PROPERTY S "]"
PROPERTY = PROP_NAME S ":" S PROP_VALUE S ";"
Some examples:
crc32 := binary [ size:4; ]
sha1 := binary [ size:20; ]
bool := uint [ range:0..1; ]
us_printable := binary [ range:32..126; ]
4.2. Element definitions
The element definitions is the real purpose of the DTD. The element
definitions block is written as "define elements" followed by a
block of statements enclosed in curly brackets. Each statement can
be either a simple statement similar to the type definition
statements, or a block statement, containing more statements.
EBLOCK = "define" WSP "elements" S "{" *(S / ELEMENT) "}"
DELEMENT = VELEMENT / CELEMENT / "%children;"
The simple statements are typically used for value elements and
consists of a name followed by ":=", id, type and optionally
properties.
VELEMENT = NAME S ":=" S ID WSP S TYPE S (PROPERTIES S *1";")/";"
The block version of the element statements are only used to
express parent-children relations. See section 3.5.
CELEMENT = NAME S ":=" S ID WSP S "container" S *1PROPERTIES S
("{" *DELEMENT "}")/";"
5. EBML standard elements
EBML defines a small set of elements that can be used in any EBML
application. An EBML document MUST begin with an EBML header
consisting of the EBML element. Generally speaking it would be
possible to define default values to all elements in the EBML
element in the document type definition for an application, and
thus being able to represent the entire header without have a
single byte written. However, in order to be able to identify EBML
documents between applications it is REQUIRED that all EBML
elements whose values differs from the standard defaults in this
document, are written in EBML data. In practice that means that at
least the DocType is always stored in all EBML documents.
5.1. EBML
The EBML element is a container for the EBML header.
EBML := 1a45dfa3 container [ card:+; ]
5.1.1. EBMLVersion
EBMLVersion is the version of EBML to which the document conforms
to.
EBMLVersion := 4286 uint [ def:1; parent:EBML; ]
5.1.2. EBMLReadVersion
The minimum EBML version a parser has to support in order to read
the document.
EBMLReadVersion := 42f7 uint [ def:1; parent:EBML; ]
5.1.3. EBMLMaxIDWidth
The maximum width of the IDs used in this document. It is
RECOMMENDED to not have wider IDs than 4. EBMLMaxIDWidth may be
larger than any actual width used in the document.
EBMLMaxIDWidth := 42f2 uint [ def:4; parent:EBML; ]
5.1.4. EBMLMaxSizeWidth
The maximum width of the size descriptors used in this document. It
is RECOMMENDED to not have wider size descriptors than 8.
EBMLMaxSizeWidth may be larger than any actual width used in the
document.
EBMLMaxSizeWidth := 42f3 uint [ def:8; parent:EBML; ]
5.1.5. DocType
An ASCII string that identifies the type of the document.
DocType := 4282 binary [ range:32..126; parent:EBML; ]
5.1.6. DocTypeVersion
DocTypeVersion is the version of document type to which the
document conforms to.
DocTypeVersion := 4287 uint [ def:1; parent:EBML; ]
5.1.7. DocTypeReadVersion
The minimum DocType version an interpreter has to support in order
to read the document.
DocTypeReadVersion := 4285 uint [ def:1; parent:EBML; ]
5.2. CRC32
The CRC32 container can be placed around any EBML element or
elements. The value stored in CRC32Value is the result of the
CRC-32 [CRC32] checksum performed on the other child elements.
CRC32 := c3 container [ level:1..; card:*; ] {
%children;
CRC32Value := 42fe binary [ size:4; ]
}
5.3. Void
The void element can be used as padding to prepare for more data,
or to fill space when data has been removed. It should simply be
ignored when the document is interpreted.
Void := ec binary [ level:1..; card:*; ]
6. References
[ABNF] D. Crocker, P. Overell, 'Augmented BNF for Syntax
Specifications: ABNF', RFC 2234, November 1997.
<url:ftp://ftp.isi.edu/in-notes/rfc2234.txt>
[CRC32] International Organization for Standardization, 'ISO
Information Processing Systems - Data Communication High-Level Data
Link Control Procedure - Frame Structure", IS 3309, October 1984,
3rd Edition.
[FLOAT] Institute of Electrical and Electronics Engineers, 'IEEE
Standard for Binary Floating-Point Arithmetic', ANSI/IEEE Standard
754-1985, August 1985.
[RFC2119] S. Bradner, 'Key words for use in RFCs to Indicate
Requirement Levels', RFC 2119, March 1997.
<url:ftp://ftp.isi.edu/in-notes/rfc2119.txt>
[UNICODE] International Organization for Standardization,
'Universal Multiple-Octet Coded Character Set (UCS), Part 1:
Architecture and Basic Multilingual Plane', ISO/IEC 10646-1:1993.
<url:http://www.unicode.org>
[UTF-8] F. Yergeau, 'UTF-8, a transformation format of ISO 10646',
RFC 3629, November 2003.
<url:ftp://ftp.isi.edu/in-notes/rfc3629.txt>
Appendix A. EBML BNF
EBML = *ELEMENT
ELEMENT = ELEMENT_ID SIZE DATA
DATA = VALUE / *ELEMENT
VINT = ( %b0 VINT 7BIT ) / ( %b1 7BIT )
; A more annotated but less correct definition of VINT
;
; VINT = VINT_WIDTH VINT_MARKER VINT_DATA
; VINT_WIDTH = *%b0
; VINT_MARKER = %b1
; VINT_DATA = VINT_ALIGNMENT VINT_TAIL
; VINT_ALIGNMENT = *BIT
; VINT_TAIL = *BYTE
ELEMENT_ID = VINT
SIZE = VINT
VALUE = INT / UINT / FLOAT / STRING / DATE / BINARY
PADDING = %x00
INT = *8BYTE
UINT = *8BYTE
FLOAT = *1( 4BYTE / 8BYTE / 10BYTE )
STRING = *BYTE *PADDING
DATE = 8BYTE
BINARY = *BYTE
Appendix B. EBML DTD BNF
; NOTE: This BNF is not correct in that it allows for more freedom
; than what is described in the text. That is because a 100%
; correct BNF would be almost unreadable. To be correct CELEMENT
; would be split into one ELEMENT token for every value type, and
; then each and every one of them would have their own PROPERTIES
; definition which points out only the DEF and RANGE for that value
; type. Some other shortcuts are noted in comments.
LCOMMENT = "//" *BYTE (CR / LF) ; *BYTE is string without CR/LF
BCOMMENT = "/*" *BYTE "*/" ; *BYTE is string without "*/"
COMMENT = LCOMMENT / BCOMMENT ; Line comment / Block comment
S = *WSP / ( *WSP COMMENT *WSP ) ; Optional white spaces
DTD = *( S / HBLOCK / TBLOCK / EBLOCK )
HBLOCK = "declare" S WSP "header" S "{" *(S / STATEMENT) "}"
EBLOCK = "define" S WSP "elements" S "{" *(S / DELEMENT) "}"
TBLOCK = "define" S WSP "types" S "{" *(S / DTYPE) "}"
DELEMENT = VELEMENT / CELEMENT / "%children;"
VELEMENT = NAME S ":=" S ID WSP S TYPE S (PROPERTIES S *1";")/";"
CELEMENT = NAME S ":=" S ID WSP S "container" S *1PROPERTIES S
("{" *DELEMENT "}")/";"
NAME = [A-Za-z_] 1*[A-Za-z_0-9]
ID = 1*( 2HEXDIG )
TYPE = VTYPE / CTYPE
VTYPE = "int" / "uint" / "float" / "string" / "date" /
"binary" / NAME
CTYPE = "conainer" / NAME
PROPERTIES = "[" S 1*PROPERTY S "]"
PROPERTY = PARENT / LEVEL / CARD / DEF / RANGE / SIZE
PARENT = "parent" S ":" S PARENTS S ";"
PARENTS = NAME / ( NAME S "," S PARENTS )
LEVEL = "level" S ":" S 1*DIGIT *(".." *DIGIT) S ";"
CARD = "card" S ":" S ( "*" / "?" / "1" / "+" ) S ";"
ORDERED = "ordered" S ":" S ( YES / NO ) S ";"
YES = "yes" / "1"
NO = "no" / "0"
DEF = "def" S ":" S DEFS S ";"
DEFS = ( INT_DEF / UINT_DEF / FLOAT_DEF / STRING_DEF /
DATE_DEF / BINARY_DEF / NAME )
RANGE = "range" S ":" S RANGE_LIST S ";"
RANGE_LIST = RANGE_ITEM / ( RANGE_ITEM S "," S RANGE_LIST )
RANGE_ITEM = INT_RANGE / UINT_RANGE / FLOAT_RANGE /
STRING_RANGE / DATE_RANGE / BINARY_RANGE
SIZE = "size" S ":" S SIZE_LIST S ";"
SIZE_LIST = UINT_RANGE / ( UINT_RANGE S "," S SIZE_LIST )
; Actual values, but INT_VALUE is too long.
INT_V = *1"-" 1*DIGIT
FLOAT_V = INT "." 1*DIGIT *1( "e" *1( "+"/"-" ) 1*DIGIT )
; DATE uses ISO short format, yyyymmddThh:mm:ss.f
DATE_V = *1DIGIT 2DIGIT 2DIGIT
*1(%x54 2DIGIT ":" 2DIGIT ":" 2DIGIT *1( "." *1DIGIT ))
INT_DEF = INT_V
UINT_DEF = 1*DIGIT
FLOAT_DEF = FLOAT_V
DATE_DEF = INT_DEF / DATE_V
STRING_DEF = ("0x" 1*( 2HEXDIG )) / ( %x22 *(%x20-7e) %x22 )
BINARY_DEF = STRING_DEF
INT_RANGE = INT_V / ( INT_V ".." ) / ( ".." INT_V ) /
( INT_V ".." INT_V )
UINT_RANGE = 1*DIGIT *1( ".." *DIGIT )
FLOAT_RANGE = ( ("<" / "<=" / ">" / ">=") FLOAT_DEF ) /
( FLOAT_DEF "<"/"<=" ".." "<"/"<=" FLOAT_DEF )
DATE_RANGE = (1*DIGIT / DATE_V) *1( ".." *(DIGIT / DATE_V) )
BINARY_RANGE = UINT_RANGE
STRING_RANGE = UINT_RANGE
STATEMENT = NAME S ":=" S DEFS S ";"
; TYPE must be defined. PROPERTIES must only use DEF and RANGE.
DTYPE = NAME S ":=" S TYPE S (PROPERTIES S *1";")/";"
Appendix C. EBML Standard definitions
define elements {
EBML := 1a45dfa3 container [ card:+; ] {
EBMLVersion := 4286 uint [ def:1; ]
EBMLReadVersion := 42f7 uint [ def:1; ]
EBMLMaxIDLength := 42f2 uint [ def:4; ]
EBMLMaxSizeLength := 42f3 uint [ def:8; ]
DocType := 4282 string [ range:32..126; ]
DocTypeVersion := 4287 uint [ def:1; ]
DocTypeReadVersion := 4285 uint [ def:1; ]
}
CRC32 := c3 container [ level:1..; card:*; ] {
%children;
CRC32Value := 42fe binary [ size:4; ]
}
Void := ec binary [ level:1..; card:*; ]
}
Appendix D. Matroska DTD
declare header {
DocType := "matroska";
EBMLVersion := 1;
}
define types {
bool := uint [ range:0..1; ]
ascii := string [ range:32..126; ]
}
define elements {
Segment := 18538067 container [ card:*; ] {
// Meta Seek Information
SeekHead := 114d9b74 container [ card:*; ] {
Seek := 4dbb container [ card:*; ] {
SeekID := 53ab binary;
SeekPosition := 53ac uint;
}
}
// Segment Information
Info := 1549a966 container [ card:*; ] {
SegmentUID := 73a4 binary;
SegmentFilename := 7384 string;
PrevUID := 3cb923 binary;
PrevFilename := 3c83ab string;
NextUID := 3eb923 binary;
NextFilename := 3e83bb string;
TimecodeScale := 2ad7b1 uint [ def:1000000; ]
Duration := 4489 float [ range:>0.0; ]
DateUTC := 4461 date;
Title := 7ba9 string;
MuxingApp := 4d80 string;
WritingApp := 5741 string;
}
// Cluster
Cluster := 1f43b675 container [ card:*; ] {
Timecode := e7 uint;
Position := a7 uint;
PrevSize := ab uint;
BlockGroup := a0 container [ card:*; ] {
Block := a1 binary;
BlockVirtual := a2 binary;
BlockAdditions := 75a1 container {
BlockMore := a6 container [ card:*; ] {
BlockAddID := ee uint [ range:1..; ]
BlockAdditional := a5 binary;
}
}
BlockDuration := 9b uint [ def:TrackDuration; ];
ReferencePriority := fa uint;
ReferenceBlock := fb int [ card:*; ]
ReferenceVirtual := fd int;
CodecState := a4 binary;
Slices := 8e container [ card:*; ] {
TimeSlice := e8 container [ card:*; ] {
LaceNumber := cc uint [ def:0; ]
FrameNumber := cd uint [ def:0; ]
BlockAdditionID := cb uint [ def:0; ]
Delay := ce uint [ def:0; ]
Duration := cf uint [ def:TrackDuration; ];
}
}
}
}
// Track
Tracks := 1654ae6b container [ card:*; ] {
TrackEntry := ae container [ card:*; ] {
TrackNumber := d7 uint [ range:1..; ]
TrackUID := 73c5 uint [ range:1..; ]
TrackType := 83 uint [ range:1..254; ]
FlagEnabled := b9 uint [ range:0..1; def:1; ]
FlagDefault := 88 uint [ range:0..1; def:1; ]
FlagLacing := 9c uint [ range:0..1; def:1; ]
MinCache := 6de7 uint [ def:0; ]
MaxCache := 6df8 uint;
DefaultDuration := 23e383 uint [ range:1..; ]
TrackTimecodeScale := 23314f float [ range:>0.0; def:1.0; ]
Name := 536e string;
Language := 22b59c string [ def:"eng"; range:32..126; ]
CodecID := 86 string [ range:32..126; ];
CodecPrivate := 63a2 binary;
CodecName := 258688 string;
CodecSettings := 3a9697 string;
CodecInfoURL := 3b4040 string [ card:*; range:32..126; ]
CodecDownloadURL := 26b240 string [ card:*; range:32..126; ]
CodecDecodeAll := aa uint [ range:0..1; def:1; ]
TrackOverlay := 6fab uint;
// Video
Video := e0 container {
FlagInterlaced := 9a uint [ range:0..1; def:0; ]
StereoMode := 53b8 uint [ range:0..3; def:0; ]
PixelWidth := b0 uint [ range:1..; ]
PixelHeight := ba uint [ range:1..; ]
DisplayWidth := 54b0 uint [ def:PixelWidth; ]
DisplayHeight := 54ba uint [ def:PixelHeight; ]
DisplayUnit := 54b2 uint [ def:0; ]
AspectRatioType := 54b3 uint [ def:0; ]
ColourSpace := 2eb524 binary;
GammaValue := 2fb523 float [ range:>0.0; ]
}
// Audio
Audio := e1 container {
SamplingFrequency := b5 float [ range:>0.0; def:8000.0; ]
OutputSamplingFrequency := 78b5 float [ range:>0.0;
def:8000.0; ]
Channels := 94 uint [ range:1..; def:1; ]
ChannelPositions := 7d7b binary;
BitDepth := 6264 uint [ range:1..; ]
}
// Content Encoding
ContentEncodings := 6d80 container {
ContentEncoding := 6240 container [ card:*; ] {
ContentEncodingOrder := 5031 uint [ def:0; ]
ContentEncodingScope := 5032 uint [ range:1..; def:1; ]
ContentEncodingType := 5033 uint;
ContentCompression := 5034 container {
ContentCompAlgo := 4254 uint [ def:0; ]
ContentCompSettings := 4255 binary;
}
ContentEncryption := 5035 container {
ContentEncAlgo := 47e1 uint [ def:0; ]
ContentEncKeyID := 47e2 binary;
ContentSignature := 47e3 binary;
ContentSigKeyID := 47e4 binary;
ContentSigAlgo := 47e5 uint;
ContentSigHashAlgo := 47e6 uint;
}
}
}
}
}
// Cueing Data
Cues := 1c53bb6b container {
CuePoint := bb container [ card:*; ] {
CueTime := b3 uint;
CueTrackPositions := b7 container [ card:*; ] {
CueTrack := f7 uint [ range:1..; ]
CueClusterPosition := f1 uint;
CueBlockNumber := 5378 uint [ range:1..; def:1; ]
CueCodecState := ea uint [ def:0; ]
CueReference := db container [ card:*; ] {
CueRefTime := 96 uint;
CueRefCluster := 97 uint;
CueRefNumber := 535f uint [ range:1..; def:1; ]
CueRefCodecState := eb uint [ def:0; ]
}
}
}
}
// Attachment
Attachments := 1941a469 container {
AttachedFile := 61a7 container [ card:*; ] {
FileDescription := 467e string;
FileName := 466e string;
FileMimeType := 4660 string [ range:32..126; ]
FileData := 465c binary;
FileUID := 46ae uint;
}
}
// Chapters
Chapters := 1043a770 container {
EditionEntry := 45b9 container [ card:*; ] {
ChapterAtom := b6 container [ card:*; ] {
ChapterUID := 73c4 uint [ range:1..; ]
ChapterTimeStart := 91 uint;
ChapterTimeEnd := 92 uint;
ChapterFlagHidden := 98 uint [ range:0..1; def:0; ]
ChapterFlagEnabled := 4598 uint [ range:0..1; def:0; ]
ChapterTrack := 8f container {
ChapterTrackNumber := 89 uint [ card:*; range:0..1; ]
ChapterDisplay := 80 container [ card:*; ] {
ChapString := 85 string;
ChapLanguage := 437c string [ card:*; def:"eng";
range:32..126; ]
ChapCountry := 437e string [ card:*; range:32..126; ]
}
}
}
}
}
// Tagging
Tags := 1254c367 container [ card:*; ] {
Tag := 7373 container [ card:*; ] {
Targets := 63c0 container {
TrackUID := 63c5 uint [ card:*; def:0; ]
ChapterUID := 63c4 uint [ card:*; def:0; ]
AttachmentUID := 63c6 uint [ card:*; def:0; ]
}
SimpleTag := 67c8 container [ card:*; ] {
TagName := 45a3 string;
TagString := 4487 string;
TagBinary := 4485 binary;
}
}
}
}
}