How to encode/decode JPG file to int sequence?

I want to convert my JPG files to int sequences and then reconvert to get the images again.

My script.sh is like this:

FILE=$(cat $2)
TOTAL=$(echo ${#FILE} - 1 | bc);
for j in $(seq 0 $TOTAL)
do
    printf "%d " "'${FILE:j:1}" >> sai.out
done

It’s apparently working fine. So sai.out receives something like 32767 32767 32767 32767 16 74 70 73 70 1 1 1 1 32767 32767 67 8 ....

With the same code but entering a text file the decodification is easy by ASCII table and printing %c.

The problem is: how can I get the image file again starting from my sai.out file?

Answer

POSIXly:

od -An -vtu1 < file > file.encoded

Where each and every byte of the file is encoded as a unsigned decimal number, with no Address.

To decode, with some awk implementations (the ones like gawk or mawk where printf("%c", 0) works):

awk '{for (i = 1; i <= NF; i++) printf "%c", $i}' < file.encoded > file

A few notes on why your approach can’t work:

  • shells other than zsh can’t store arbitrary data (especially the NUL byte) in their variable.
  • and command substitution in Bourne-like shells strips trailing newline characters (0xa bytes on most systems)
  • you need to quote variables in Bourne-like shells other than zsh
  • In shells that have the ${var:offset:length} ksh93 operator (ksh93, bash, zsh, mksh), offset and length are expressed in number of characters, not bytes (UTF-8 however is the only multi-byte character encoding supported by mksh and only when the utf8-mode option is enabled).
  • printf %d 'x returns the codepoint number of the character. That’s only the byte value in single-byte character sets. Here, you’re probably using bash and are in a locale using the UTF-8 encoding as bash‘s printf gives random values for bytes not forming part of valid characters there.
  • text is defined as sequences of text lines, themselves being sequences of non-NUL characters (so limited to byte sequences forming valid characters) whose length (in number of bytes including the newline character) doesn’t exceed LINE_MAX (see getconf LINE_MAX) and delimited by a newline character. So except for very small jpg files, your sai.out would end up not being valid text and you wouldn’t have any guarantee that it would be processed OK by text utilities (od here only outputs a few numbers per line).

Leave a Reply

Your email address will not be published. Required fields are marked *