Spring Batch – StaxEventItemReader handle invalid rows in xml

I’m using Spring Batch for reading XML files. And I want to validate records using XSD. I’m able to run validations using setSchema but it will throw exception and kill whole job. My goal is to handle these invalid records, save them to log and skip them for final process.

My StaxEcentItemReader

    @Bean
    @JobScope
    public StaxEventItemReader<?> reader() throws Exception {
        Jaxb2Marshaller jaxb2Marshaller = new Jaxb2Marshaller();
        jaxb2Marshaller.setClassesToBeBound(BookDto.class);
        jaxb2Marshaller.setSchema(new ClassPathResource("book.xsd"));
        jaxb2Marshaller.afterPropertiesSet();

        return new StaxEventItemReaderBuilder<>()
                .name("xmlReader")
                .resource(new ClassPathResource("books.xml"))
                .addFragmentRootElements("book")
                .unmarshaller(jaxb2Marshaller)
                .build();
    }

XSD

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" targetNamespace="http://www.test.com/xsd"
  xmlns="http://www.test.com/xsd"
  xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="book" type="bookDto"/>
  <xs:simpleType name="simAuthor">
    <xs:restriction base="xs:string">
      <xs:maxLength value="10"/>
    </xs:restriction>
  </xs:simpleType>
  <xs:complexType name="bookDto">
    <xs:sequence>
      <xs:element name="author" type="simAuthor" minOccurs="0"/>
      <xs:element type="xs:float" name="price"/>
    </xs:sequence>
    <xs:attribute type="xs:string" name="id" use="required"/>
  </xs:complexType>
</xs:schema>

Items

<?xml version="1.0"?>
<catalog>
  <book xmlns="http://www.test.com/xsd"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.test.com/xsd" id="bk101">
    <author>Gambardella, MatthewMatthewMatthewMatthewMatthewMatthewMatthewMatthew</author>
    <price>44.95s</price>
  </book>
  <book xmlns="http://www.test.com/xsd"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.test.com/xsd" id="bk102">
    <author>Ralls, Kim</author>
    <price>5.95</price>
  </book>
</catalog>

Answer

I’m able to run validations using setSchema but it will throw exception and kill whole job.

You can use a fault-tolerant step and declare the exception as skippable. With this in place, invalid items will be skipped and the job will continue with next items instead of failing at the first invalid item. For more details, please refer to the Configuring Skip Logic section of the reference documentation.

My goal is to handle these invalid records, save them to log and skip them for final process.

For that, you need to register a SkipListener and log invalid items where needed.

Leave a Reply

Your email address will not be published. Required fields are marked *