leb: LEB128 utilities for Node
==============================

This Node module provides several utility functions for
dealing with the LEB128 family of integer representation formats.

LEB128, which is short for "Little-Endian Base 128", is somewhat like
UTF-8 in representing numbers using a variable number of bytes. Unlike
UTF-8, LEB128 uses just the high bit of each byte to determine the
role of a byte. This makes it a somewhat more compact representation
but with some cost in terms of the complexity on the read side.

LEB128 was first defined as part of the
[DWARF 3 specification](http://dwarfstd.org/Dwarf3Std.php), and it
is also used in Android's
[DEX file format](https://source.android.com/docs/core/runtime/dex-format).

This module provides encoders and decoders for both signed and
unsigned values, and with the decoded form being any of 32-bit
integers, 64-bit integers, and arbitrary-length buffer (taken to be a
bigint-style representation in little-endian order).

The 64-bit integer variants require a special note: Because JavaScript
can't represent all possible 64-bit integers in its native number
type, the 64-bit decoder methods return a `lossy` flag which indicates
if the decoded result isn't exactly the number represented in the
encoded form.


## Format Details

The LEB128 format is really quite simple.

An encoded value is a series of bytes where the high bit (bit #7 or
`0x80`) is set on each byte but the final one. The other seven bits
of each byte are the payload bits.

To interpret an encoded value, one concatenates the payload bits
in little-endian order (so the *first* payload byte contains the
*least* significant bits). After that, if the encoded value is
a signed representation, one sign-extends the result.

Schematically, here are the one-byte encodings:

```
               +--------+
encoded        |0GFEDCBA|
               +--------+

unsigned       +--------+
interpretation |0GFEDCBA|
               +--------+

signed         +--------+
interpretation |GGFEDCBA|
               +--------+
```

That is: The unsigned interpretation of a single-byte encoding is the
byte value itself. The signed interpretation is of the value as a
signed seven-bit integer.

Similarly, here are the two-byte encodings:

```
               +--------+ +--------+
encoded        |1GFEDCBA| |0NMLKJIH|
               +--------+ +--------+

unsigned       +----------------+
interpretation |00NMLKJIHGFEDCBA|
               +----------------+

signed         +----------------+
interpretation |NNNMLKJIHGFEDCBA|
               +----------------+
```

That is: The unsigned interpretation of a two-byte encoding is
a 14-bit integer consisting of the first-byte payload bits and
second-byte payload bits concatenated togther. The signed
interpretation is the same as the unsigned, except that bit #13
is treated as the sign and is hence extended to fill the remaining
bits.

Some concrete examples (all numbers are hex):

```
encoded      unsigned          signed
bytes        interpretation    interpretation
-------      --------------    --------------
10           +10               +10
45           +45               -3b
8e 32        +190e             +190e
c1 57        +2bc1             -143f
80 80 80 3f  +7e00000          +7e00000
80 80 80 4f  +9e00000          -6200000
```


## Building and Installing

```shell
npm install leb
```

Or grab the source. As of this writing, this module has no
dependencies, so once you have the source, there's nothing more to do
to "build" it.


## Testing

```shell
npm test
```

Or

```shell
node ./test/test.js
```


## API Details


### decodeInt32(buffer, [index]) -> { value: num, nextIndex: num }

Takes a signed LEB128-encoded byte sequence in the given buffer at the
given index (defaults to `0`), returning the decoded value and the
index just past the end of the encoded form. The value is expected to
be a 32-bit integer.

This throws an exception if the buffer doesn't have a valid encoding
at the index (only possibly true if the last byte in the buffer has
its high bit set) or if the decoded value is out of the range of the
expected type.

### decodeInt64(buffer, [index]) -> { value: num, nextIndex: num, lossy: bool }

Takes a signed LEB128-encoded byte sequence in the given buffer at the
given index (defaults to `0`), returning the decoded value, the index
just past the end of the encoded form, and a boolean indicating
whether the decoded value experienced numeric conversion loss. The
value is expected to be a 64-bit integer.

This throws an exception if the buffer doesn't have a valid encoding
at the index (only possibly true if the last byte in the buffer has
its high bit set) or if the decoded value is out of the range of the
expected type.

### decodeIntBuffer(encodedBuffer, [index]) -> { value: buffer, nextIndex: num }

Takes a signed LEB128-encoded byte sequence in the given buffer at the
given index (defaults to `0`), returning the decoded value and the
index just past the end of the encoded form. The decoded value is a
bigint-style buffer representing a signed integer, in little-endian
order.

This throws an exception if the buffer doesn't have a valid encoding
at the index (only possibly true if the last byte in the buffer has
its high bit set).

### decodeUint32(buffer, [index]) -> { value: num, nextIndex: num }

Like `decodeInt32`, but with the unsigned LEB128 format and unsigned
32-bit integer type.

### decodeUint64(buffer, [index]) -> { value: num, nextIndex: num, lossy: bool }

Like `decodeInt64`, but with the unsigned LEB128 format and unsigned
64-bit integer type.

### decodeUintBuffer(encodedBuffer, [index]) -> { value: buffer, nextIndex: num }

Like `decodeIntBuffer`, but with the unsigned LEB128 format.

### encodeInt32(num) -> buffer

Takes a 32-bit signed integer, returning the signed LEB128 representation
of it.

### encodeInt64(num) -> buffer

Takes a 64-bit signed integer, returning the signed LEB128 representation
of it.

### encodeIntBuffer(buffer) -> encodedBuf

Takes a bigint-style buffer representing a signed integer, returning the
signed LEB128 representation of it.

### encodeUint32(num) -> buffer

Like `encodeInt32`, but with the unsigned 32-bit integer type and returning
unsigned LEB128.

### encodeUint64(num) -> buffer

Like `encodeInt64`, but with the unsigned 64-bit integer type and returning
unsigned LEB128.

### encodeUintBuffer(buffer) -> encodedBuf

Like `encodeInt32`, but with the buffer argument in unsigned bigint form
and returning unsigned LEB128.


## Contributing

Questions, comments, bug reports, and pull requests are all welcome.

Bug reports that include steps-to-reproduce (including code) are the
best. Even better, make them in the form of pull requests that update
the test suite. Thanks!

- - - - - - - - - -
```
Copyright 2012-2024 the Leb Authors (Dan Bornstein et alia).
SPDX-License-Identifier: Apache-2.0
```