PHP Classes

PHP Binary Stream: Parse extract data from binary files

Recommend this page to a friend!
  Info   View files Example   View files View files (11)   DownloadInstall with Composer Download .zip   Reputation   Support forum (1)   Blog    
Ratings Unique User Downloads Download Rankings
StarStarStarStar 61%Total: 199 This week: 1All time: 8,485 This week: 560Up
Version License PHP version Categories
binarystream 1.0.3BSD License5PHP 5, Files and Folders, Parsers
Description 

Author

This class can parse extract data from binary files.

It can read binary data from a given file and extract data blocks of several types and sizes.

Currently it supports data blocks of types like bits, characters, big and little endian short and long integers of 16, 32 and 64 bits, 32 and 64 bits float and double.

The class can also read from a configuration file group data type definitions associated to given names.

Innovation Award
PHP Programming Innovation award nominee
January 2017
Number 8


Prize: One subscription to the PDF edition of the PHP Architect magazine
Sometimes applications need to parse files that have information in a binary format.

PHP has functions to convert a binary stream of data into individual values, but for whole streams in a specific format, parsing data may be a complicated task.

This class simplifies that process by the means of functions that read files and extract values of some expected types.

It also provides means to parse groups of data that have a well known format, by using data group definitions.

Manuel Lemos
Picture of Sergey Vanyushin
  Performance   Level  
Name: Sergey Vanyushin is available for providing paid consulting. Contact Sergey Vanyushin .
Classes: 15 packages by
Country: Russian Federation Russian Federation
Age: 28
All time rank: 57416 in Russian Federation Russian Federation
Week rank: 106 Up7 in Russian Federation Russian Federation Up
Innovation award
Innovation award
Nominee: 15x

Winner: 2x

Example

#!/usr/bin/php
<?php
require __DIR__.'/../vendor/autoload.php';
if (!isset(
$argv[1])) die('Usage: '.__FILE__.' <mp3file>'.PHP_EOL);

$s = new wapmorgan\BinaryStream\BinaryStream($argv[1]);
$s->loadConfiguration(__DIR__.'/../conf/mp3.conf');
function
convertText($content) { return ($content[0] == 0x00) ? mb_convert_encoding(substr($content, 1), 'utf-8', 'ISO-8859-1') : substr($content, 1); }
if (
$s->compare(3, 'ID3')) {
   
$header = $s->readGroup('id3v2');
   
$header_size = null;
    for (
$i = 0; $i < 4; $i++) {
       
$header_size .= decbin(ord($header['size'][$i]));
    }
   
$header_size = bindec($header_size);
   
$group = ($header['version'] == 2) ? 'id3v232' : 'id3v234';
   
$tags_2 = array();
    while (!
$s->compare(3, "\00\00\00")) {
       
$frame = $s->readGroup($group);
       
$frame_content = $s->readString($frame['size']);
        switch (
$frame['id']) {
            case
'TIT2': case 'TT2': $tags_2['song'] = convertText($frame_content); break;
            case
'TALB': case 'TAL': $tags_2['album'] = convertText($frame_content); break;
            case
'TPE1': case 'TP1': $tags_2['artist'] = convertText($frame_content); break;
            case
'TYER': case 'TYE': $tags_2['year'] = convertText($frame_content); break;
            case
'COMM': case 'COM':
               
$frame_content = substr(convertText($frame_content), 3);
               
$tags_2['comment'] = strpos($frame_content, "\00") ? substr($frame_content, strpos($frame_content, "\00") + 1) : $frame_content;
                break;
            case
'TRCK': case 'TRK': $tags_2['track'] = convertText($frame_content); break;
            case
'TCON': case 'TCO': $tags_2['genre'] = convertText($frame_content); break;
        }
    }
   
var_dump($tags_2);
}

$s->go(-128);
if (
$s->compare(3, 'TAG')) {
   
$tags = $s->readGroup('id3v1');
   
var_dump(array_map(function ($item) { return trim($item); }, $tags));
}


Details

BinaryStream

BinaryStream - a handy tool for working with binary data and the best replacement for pack()/unpack() with big list of features.

Composer package Latest Stable Version Total Downloads License Latest Unstable Version Tests

If you are looking for a convenient tool that would allow read and write binary data (whether existing or created by you format), you choose the correct library.

_BinaryStream_ - is a powerful tool for reading and writing binary files. It supports a variety of high-level data types and sometimes lets you forget that you are working with unstructured binary data.

With BinaryStream you can handle network packets, binary files, system protocols and other low-level data.

  1. Features
  2. Manual
  3. Reference - Data types - API - Groups of fields - Navigation - Auxiliary - Configurations
  4. Advanced usage. Writing

Features

  • Minimal supported PHP version is 5.3.0 for all features.
  • The library supports all major data types and allows both read and write the data.
  • Supports both direct order of bytes (big endian) and reverse (little). You can switch between them while reading a file.
  • Supports multiple dimensions of integers (8, 16, 32 and 64) and also rare (24, 40, 48 and 56).
  • Supports multiple dimensions of fractional numbers (32 and 64).
  • You can read both individual bytes and individual bits.
  • For ease of navigation through the file, you can command BinaryStream remember some positions in the file, and later return to them again.
  • Supports data groups: save configuration once and then read similar data groups with only it's name.
  • Supports configuration files to switch between file formats and versions.
  • Unlike standard php functions, it can work with fractional numbers written in both the direct order of bytes (Big-Endian) and the reverse one (Little-Endian).

Why it's objectively better pack/unpack? - 64 bit int's/float's - selection of byte-order of float's - rare, but possible int's size (24, 40, 48, 56) - other features like data groups and configurations ...

And that's all with PHP 5.3.

Manual

Simple usage

The easiest way to use BinaryStream - this:

use wapmorgan\BinaryStream\BinaryStream;
$stream = new BinaryStream('filename.ext');
$text = $s->readString(20);

This example reads 20 bytes at the beginning of the file as a string.

A more complex example, where the data were located in the following order: - integer (int, 32 bit) - float (float, 32 bit) - flag byte (where each bit has its own value, 8 bits): first bit determines whether there after this byte written another data, 5-bit empty, and the last 2 bits of the data type:

- `0b00` - after this data recorded 1 character (char, 8 bits)
- `0b01` - after this data recorded 10 characters (string, 10 bytes)
- `0b10` - after this data time in unixtime format packaged in long integer (long, 64 bits)
- `0b11` - not used at this moment.

In order to read these data and those that depend on the flags, this example is suitable:

use wapmorgan\BinaryStream\BinaryStream;
$stream = new BinaryStream('filename.ext');
$int = $stream->readInteger(32);
$float = $stream->readFloat(32);
$flags = $stream->readBits([
    'additional_data' => 1,
    '_' => 5, // pointless data
    'data_type' => 2,
]);
if ($flags['additional_data']) {
    if ($flags['data_type'] == 0b00)
        $char = $stream->readChar();
    else if ($flags['data_type'] == 0b01)
        $string = $stream->readString(10);
    else if ($flags['data_type'] == 0b10)
        $time = date('r', $stream->readInteger(64));
}

In this example, we read the basic data and the additional, based on the value of flags.

But it is unlikely to be so few data. For added convenience, you can use a group reading function. The previous example can be rewritten as follows:

use wapmorgan\BinaryStream\BinaryStream;
$stream = new BinaryStream('filename.ext');
$data = $stream->readGroup([
    'i:int' => 32,
    'f:float' => 32,
    'additional_data' => 1,
    '_' => 5,
    'data_type' => 2,
]);
if ($data['additional_data']) {
    if ($data['data_type'] == 0b00)
        $data['char'] = $stream->readChar();
    else if ($data['data_type'] == 0b01)
        $data['string'] = $stream->readString(10);
    else if ($data['data_type'] == 0b10)
        $data['time'] = date('r', $stream->readInteger(64));
}

If you are reading a file in which such groups of data are repeated, you can save a group with a name, and then simply refer to it to read the next data. Let us introduce one more value for data_type: 0b11 - means that this is the last group of data in the file. An example would be:

use wapmorgan\BinaryStream\BinaryStream;
$stream = new BinaryStream('filename.ext');
$stream->saveGroup('Data', [
    'i:int' => 32,
    'f:float' => 32,
    'additional_data' => 1,
    '_' => 5,
    'data_type' => 2,
]);

do {
    $data = $stream->readGroup('Data');
    // Some operations with data
} while ($data['data_type'] != 0b11);

And now imagine that we have moved to a new file format that is different from the previous one and has a certain mark in the beginning of the file, which will help to distinguish the new from the old format. For example, a new label is a sequence of characters 'A', 'S', 'C'. We need to check the label and if it is present, parse the file according to another scheme, and if it does not, use the old version of the processor. An example to illustrate this:

use wapmorgan\BinaryStream\BinaryStream;
$stream = new BinaryStream('filename.ext');

if ($stream->compare(3, 'ASC')) {
    // parse here new format
} else {
    $stream->saveGroup('DataOld', [
        'i:int' => 32,
        'f:float' => 32,
        'additional_data' => 1,
        '_' => 5,
        'data_type' => 2,
    ]);

    do {
        $data = $stream->readGroup('DataOld');
        // Some operations with data
    } while ($data['data_type'] != 0b11);
}

Installation

Installation via composer:

composer require wapmorgan/binary-stream

Reference

Data types

All used data types are presented in the following table:

| Type | Dimensions | Values range | Notes | |---------|-----------------|---------------------------------------------------------|-------| | integer | 8/16/32/64 bits | 0 to 255/65 535/4 294 967 295/9 223 372 036 854 775 807 | Also, there's support for non-standard sizes like 24, 40, 48 and 56 bits. | | float | 32/64 bits | 0 to 3.4 x 10^38/1.798 x 10^308 | Also, there's support for choosing byte-order when storing a float (unlike pack()). | | char | 1 byte | From 0 to 255 ascii chars | - | | string | [n] of bytes | ... | - | | bit | [n] of bits | 0 or 1 | Also, there's support for combining few consecutive bits in one value. |

API

  • Creating an object is possible in several ways: `new BinaryStream($filename | $socket | $stream)`
  • Reading data is possible using specialized methods for each data type: - bit: - `readBit(): boolean`

    Example: `$flag = $s->readBit();` - `readBits(array $listOfBits): array of boolean and integers`.

    Example: `$flags = $s->readBits(['a' => 2, '_' => 5, 'b' => 3]);` If size of field (an array element value is `1`, then this field will have `true/false`, if larger 1, then `N` consecutive bits will be combined in an `integer`.) - char: - `readChar(): string(1)`

    Example: `$char = $s->readChar();` - `readChars($count): array of string(1)`

    Example: `$chars = $s->readChars(4);` - integer - `readInteger($sizeInBits = 32): integer`

    Example: `$int = $s->readInteger(32);` It supports the following dimensions: 8, 16, 32, 64 and 24, 40, 48, 56 bits. - float: - `readFloat($sizeInBits = 32): float`

    Example: `$float = $s->readFloat(32);` It supports the following dimensions: 32, 64 bits. - string: - `readString($length): string($length)`

    Example: `$string = $s->readString(10);`

Groups of fields

You can save list of fields definitions with a specific name and use it's name when you need to read the same block few times. A group is defined by group configuration - list of fields, their type and size. To compose group configuration create an array: keys define type and name of fields, values - their size: - key is a name of field and may contain type of field. To specify type prepend name with a type letter and a colon. Type letters:

- `b` - bit
- `i` - integer
- `f` - float
- `c` - char
- `s` - string
If type is not defined, field will be treated as a `bit`-field.
Example: `flag` - `bit`-field, `s:name` - `string`-field

  • value is a size or a dimension of field:
    - If field has `integer`, `float` or `bit` type, it defines size of field in term of bits.
    - If field has `char` or `string` type, it defines size in term of bytes.
    Example:
    'flags' => 16, // bits-field (16 bits = 2 bytes)
    's:name' => 10, // string-field (80 bits = 10 bytes)
    

So full example of group configuration:

$group = [
    'flags' => 16,
    'i:date' => 32,
    'f:summ' => 32,
    's:name' => 10,
];

| Method | Usage | Notes | |-----------------------------------------------|-------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | saveGroup($name, array $groupConfiguration) | $s->saveGroup('data', ['i:field' => 32, 's:text' => 20]); | Create new group with few fields. If group with that name already exists, it replaces original group. | | readGroup($name) | $data = $s->readGroup('data'); | It allows you to read data from pre-saved configuration. To save a group under a new name, use the method saveGroup($name, array $fields) | | readGroup(array $groupConfiguration) | $data = $s->readGroup(['i:field' => 32, 's:text' => 20]); | The fields are listed in the as array in which the keys determine the type and the name of the data fields, and values - dimension (understood as bytes for string and chars, and as bits for everything else). Supported: s, c, i, f and b. If the type is not specified, the field is perceived as a bit (or a few bits). The type and name are separated by a colon (:). |

Navigation

  • Caret moving: To change the position of the cursor in the file use the following methods.

| Method | Usage | Notes | |----------------|----------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------| | go($offset) | $stream->go(-128); | It goes to the absolute position in the file. If you pass a negative value, the value of the cursor will be set to -$offset bytes from the end of the file. | | go($mark) | $stream->go('FirstTag'); | It moves to the position where the $mark mark has been set. | | skip($bytes) | $stream->skip(4); | Skip the following $bytes bytes. |

  • Current position testing:
    isEnd(): boolean
    
    Returns true if cursor is at the end of file.
  • Remembering the positions in file:

| Method | Usage | Notes | |------------------------------|------------------------------------------|--------------------------------------------------------------------| | mark($name) | $stream->mark('Tag'); | It saves the current cursor position under the $name name. | | markOffset($offset, $name) | $stream->markOffset(-128, 'FirstTag'); | It saves specific position in file under the $name name. | | isMarked($name) | $stream->isMarked('Tag'); | Check whether the $name mark set. Returns true or false. |

Auxiliary

  • Comparation of bytes:
    compare($length, $bytes)
    
    Compares `$length` bytes from current position with `$bytes`. Carrent position will not be changed. Returns true or false.
  • Endianness: By default, BinaryStream treats int's and long's in little-endian format. To change the reading order of bytes use `setEndian($endian)` method with one of `BinaryStream` constants:

| Constant | Meaning | |------------------------|--------------------------------------| | BinaryStream::BIG | Big-endian for integers and floats | | BinaryStream::LITTLE | Little-endian for integers and float |

Configurations

| Method | Usage | Notes | |----------------------------|---------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | loadConfiguration($file | $stream->loadConfiguration('file_format.conf'); | Load configuration (byte order and data groups) from an external file. Configuration format - ini. To see an example of such a file, open the conf/mp3.conf file. | | saveConfiguration($file) | $stream->saveConfiguration('file_format.conf'); | Saves the current settings of byte order and all created data groups to an external file in ini-format. This configuration can be later restored from the file with the method loadConfiguration(). |

Advanced usage. Writing

If you are the one who needs to write data to binary files, you can use additional methods to do so.

Firstly, you need to open a file in one of the modes that allow writing of a file (by default, files are opened in read-only mode). For this when you create an object BinaryStream specify in second argument one of the following modes:

| Mode | Constant | Notes | |------------|--------------------------|-------------------------------------------------------------------------------------------| | Creation | BinaryStream::CREATE | Creates new file. Fails when file already exists. | | Recreation | BinaryStream::RECREATE | Clears all file content and sets cursor at the beginning. Fails when file doesn't exist. | | Rewriting | BinaryStream::REWRITE | Opens file and sets cursor at the beginning. Fails when file doesn't exist. | | Appending | BinaryStream::APPEND | Opens file and sets cursor at the end. Fails when file doesn't exist. |

After you have correctly opened the file, you can use the following methods, named by analogy with the other designed for reading.

| Data type | Method | Example | Notes | |---------------|---------------------------------------|---------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | bit | writeBit($bit) | $s->writeBit(true); | | | | writeBits(array $bits) | $s->writeBits([true, false, [2, 2], [4, 10]]); | You can combine multiple bits into a single number. To do this, instead of using an array of boolean, in which the first element is the number of bits is used to record the number, and the second element - number. | | char | writeChar($char) | $s->writeChar(32); | You can pass a character (string), and the code for this symbol (an integer up to 256). | | integer | writeInteger($integer, $sizeInBits) | $s->writeInteger(256, 32); | It supports the following dimensions: 8, 16, 32, 64 bits. | | float | writeFloat($float, $sizeInBits) | $s->writeFloat(123.123, 32); | It supports the following dimensions: 32, 64 bits. | | string | writeString($string) | $s->writeString('Abracadabra'); | |


  Files folder image Files  
File Role Description
Files folder imagebin (1 file)
Files folder imageconf (1 file)
Files folder imagesrc (1 file)
Files folder imagetests (4 files)
Plain text file .travis.yml Data Auxiliary data
Plain text file composer.json Data Auxiliary data
Plain text file LICENSE Lic. License text
Plain text file README.md Doc. Documentation

  Files folder image Files  /  bin  
File Role Description
  Plain text file mp3reader Example Example script

  Files folder image Files  /  conf  
File Role Description
  Plain text file mp3.conf Data Auxiliary data

  Files folder image Files  /  src  
File Role Description
  Plain text file BinaryStream.php Class Class source

  Files folder image Files  /  tests  
File Role Description
  Plain text file CompareTest.php Class Class source
  Plain text file MarkTest.php Class Class source
  Plain text file ReaderTest.php Class Class source
  Plain text file WriterTest.php Class Class source

 Version Control Unique User Downloads Download Rankings  
 100%
Total:199
This week:1
All time:8,485
This week:560Up
User Ratings User Comments (3)
 All time
Utility:75%StarStarStarStar
Consistency:80%StarStarStarStarStar
Documentation:80%StarStarStarStarStar
Examples:75%StarStarStarStar
Tests:-
Videos:-
Overall:61%StarStarStarStar
Rank:1092
 
nice
7 years ago (muabshir)
80%StarStarStarStarStar
nice
7 years ago (muabshir)
80%StarStarStarStarStar
nice
7 years ago (muabshir)
80%StarStarStarStarStar