JsonParseException: Invalid UTF-8 middle byte 0x2d

I have a JSON String from which I am making a InputStream object as shown below and then I am making a GenericRecord object as I am trying to serialize my JSON object to Avro schema.

InputStream input = new ByteArrayInputStream(jsonString.getBytes());
DataInputStream din = new DataInputStream(input);

Decoder decoder = DecoderFactory.get().jsonDecoder(schema, din);

DatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(schema);
// below line is throwing exception
GenericRecord datum = reader.read(null, decoder);   

Below is the exception I am getting:

org.codehaus.jackson.JsonParseException: Invalid UTF-8 middle byte 0x2d at [Source: [email protected]; line: 1, column: 74]

And here is the actual JSON string on which this exception is happening:

{"name":"car_test","attr_value":"2006|Renault|Megane II Coupé-Cabriolet|null|null|null|null|0|Wed Feb 03 10:00:59 GMT-07:00 2016|1|77|null|null|null|null","data_id":900}

I did some research and found out that I need to use ByteArrayInputStream with UTF-8 encodings as shown below:

InputStream input = new ByteArrayInputStream(jsonString.getBytes(StandardCharsets.UTF_8.displayName()));

But my question is what is the reason of this exception? And why it is happening on my above JSON String? I am just trying to understand why this exception is happening on my above JSON String. And using UTF-8 is the right fix for this?

What does this error means Invalid UTF-8 middle byte 0x2d?

Answer

You start with a Java Unicode String jsonString.

You then convert it into a byte stream using String.getBytes(). Since you didn’t specify the byte encoding the platform default is used which is most likely ISO 8859-1.

Now you parse the JSON from the (Data)InputStream. Now Avro seems to use UTF-8 to decode the bytes. And when it encounters the é (0x2d) it fails since it is not a valid UTF byte sequence.

So in the end it is a mismatch between the actual encoding (ISO 8859-1) and the expected encoding (UTF-8).

You can solve this like you did, or just avoid to go from string to bytes:

Decoder decoder = DecoderFactory.get().jsonDecoder(schema, jsonString);

Leave a Reply

Your email address will not be published. Required fields are marked *