C/C++ Compressing Integer to Short and Decompressing to Integer [closed]

I’m trying to find a way to send a value through the network that is 16bit in length (short), whilst its original value is determined by 32bit in length (Integer).

The idea is to save traffic size through the network as much as possible by compressing/decompressing values for Client and Server communications.

I was thinking of taking the Integer value and shifting its bit to right and in the client it will shift them to left. However this will not bring accurate results when the 16 bits on the left are lost.

I’m not well familiar with compressions/decompression algorithms, but I have a good feeling that this is possible.

Code example:

#include <stdio.h>
void printBits(size_t const size, void const * const ptr)
{
    unsigned char *b = (unsigned char*) ptr;
    unsigned char byte;
    int i, j;
    
    for (i = size-1; i >= 0; i--) {
        for (j = 7; j >= 0; j--) {
            byte = (b[i] >> j) & 1;
            printf("%u", byte);
        }
    }
    puts("");
}

////////////////////////////////////////
short Server(int value) {
    return value >> 3;
}

int Client(short value) {
    return value << 3;
}

int main()
{
    int value = 150000;
    printBits(sizeof(value), &value);
    printf("Before=%d, After=%d", Server(value), Client(Server(value)));
    return 0;
}

This brings the expected results, but will not be accurate when the relevant bits are stripped from the left side through the conversion from Integer to Short.

Maybe there is away by XORing these values or different ways to operate on the bits. I have the feeling that it will never be precise, but I’m aiming for getting as close as possible to the original results.

Answer

For 16 bits, there are 216 = 65,536 possible settings. Each setting of bits can represent at most one value. So 16 bits of data can represent only 65,536 values.

32 bits have 232 = 4,294,967,296 possible settings. If your 32-bit data uses more than 65,536 of the possible values that could be represented, it is not possible to compress it to 16 bits and always recover the original value when decompressing it.

Options include:

  • If your 32 bits of data use no more than 65,536 values, you can create a code that maps each used value to a number from 0 to 65,535. Encoding the value then compresses the data to 16 bits, and decoding the value decompresses it.
  • If full accuracy is not required, you can map nearby 32-bit values to a single 16-bit value. For example, if values from 0 to 150,000 are used, you can shift right by two bits to get a value from 0 to 37,500, which will fit in a 16-bit unsigned short. You can recover approximately the original value by shifting left two bits. You could also divide by 3 instead of shifting (which is equivalent to dividing by 4), and that would reduce the error somewhat, but division takes longer than shifting in many processors. Division by 3 still produces a result that fits in 16 bits because 150,000/3 < 65536. You could also divide by 2.3 to reduce error further, but that is more work.
  • Do not use compression, because it is not possible to obtain the desired result of compressing a 32-bit value into 16 bits.

Regarding the shifting-and-losing-accuracy solution: Sometimes it is preferred to add a bit to the decompressed value to reduce the average and maximum error. For example, 100, 101, 102, and 103 would all be compressed to the same value, 25. Then simple recovery by multiplying by four would produce 100. This has errors of 0, −1, −2, and −3 for 100, 101, 102, and 103, respectively. (Average error magnitude is 1½, maximum is 3.) If we add 1 after multiplying, 25 decompresses to 101, and then the errors are +1, 0, −1, and −2 (average 1, maximum 2).