Excel ItemReaders and ItemWriters

jberet-module includes excelEventItemReader, excelUserModelItemReader, excelUserModelItemWriter, excelStreamingItemReader, excelStreamingItemWriter for working with Excel files, including both binary Excel files (.xls) and OOXML Excel files (.xlsx). They are all based on Apache POI library for low-level Excel operations.

Name Excel File Type Underlying POI or XML API Benefits
excelEventItemReader .xls POI event model API small memory footprint
excelUserModelItemReader .xls & .xlsx POI user model API simple API for in-memory access
excelUserModelItemWriter .xls & .xlsx POI user model API simple API for in-memory content generation
excelStreamingItemReader .xlsx POI XSSF streaming reader API & StAX stream read large data set
excelStreamingItemWriter .xlsx POI SXSSF (buffered streaming) API write large data set

The following dependencies are required:

<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi</artifactId>
    <version>${version.org.apache.poi}</version>
    <scope>compile</scope>
</dependency>
<dependency>
    <groupId>commons-codec</groupId>
    <artifactId>commons-codec</artifactId>
    <version>${version.commons-codec}</version>
</dependency>

<!-- XML-related libraries needed for working with OOXML Excel files -->
<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi-ooxml</artifactId>
    <version>${version.org.apache.poi}</version>
</dependency>
<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi-ooxml-schemas</artifactId>
    <version>${version.org.apache.poi}</version>
</dependency>
<dependency>
    <groupId>org.apache.xmlbeans</groupId>
    <artifactId>xmlbeans</artifactId>
    <version>${version.org.apache.xmlbeans}</version>
    <exclusions>
        <exclusion>
            <groupId>stax</groupId>
            <artifactId>stax-api</artifactId>
        </exclusion>
    </exclusions>
</dependency>
<dependency>
    <groupId>dom4j</groupId>
    <artifactId>dom4j</artifactId>
    <version>${version.dom4j}</version>
    <exclusions>
        <exclusion>
            <groupId>xml-apis</groupId>
            <artifactId>xml-apis</artifactId>
        </exclusion>
    </exclusions>
</dependency>

Batch Configuration in Job XML

excelEventItemReader, excelUserModelItemReader, excelUserModelItemWriter, excelStreamingItemReader, excelStreamingItemWriter are configured through <reader> or <writer> batch properties in job xml. All properties are of type String, unless noted otherwise. The following is example job xml files that reference these readers and writers:

excelEventItemReader

<job id="ExcelReaderIBMTest" xmlns="http://xmlns.jcp.org/xml/ns/javaee" version="1.0">
    <step id="ExcelReaderIBMTest.step1">
        <chunk item-count="100">
            <reader ref="excelEventItemReader">
                <properties>
                    <property name="resource" value="IBM_unadjusted.xls"/>
                    <property name="beanType" value="org.jberet.support.io.StockTrade"/>
                    <property name="sheetName" value="#{jobParameters['sheetName']}"/>
                    <property name="start" value="1"/>
                    <property name="headerRow" value="0"/>

                    <!-- do not take header values; use headerRow (its default value 0) instead -->
                    <!--<property name="header" value="#{jobParameters['header']}"/>-->

                    <!-- example how to skip performing Bean Validation on the data POJO -->
                    <property name="skipBeanValidation" value="true"/>
                </properties>
            </reader>
            <writer ref="csvItemWriter">
                <properties>
                    <property name="resource" value="#{jobParameters['writeResource']}"/>
                    <property name="writeMode" value="overwrite"/>
                    <property name="beanType" value="#{jobParameters['beanType']}"/>

                    <!-- for csvItemWriter, use the header passed from jobParameters -->
                    <property name="header" value="#{jobParameters['header']}"/>
                </properties>
            </writer>
        </chunk>
    </step>
</job>

excelUserModelItemReader

<job id="ExcelReaderTest" xmlns="http://xmlns.jcp.org/xml/ns/javaee" version="1.0">
    <step id="ExcelReaderTest.step1">
        <chunk item-count="100">
            <reader ref="excelUserModelItemReader">
                <properties>
                    <property name="resource" value="person-movies.xlsx"/>
                    <property name="beanType" value="org.jberet.support.io.Person"/>
                    <property name="sheetName" value="Sheet1"/>
                    <property name="start" value="1"/>
                    <property name="headerRow" value="0"/>

                    <!-- need to ignore unknown properties as some additional properties in data -->
                    <property name="deserializationFeatures" value="FAIL_ON_UNKNOWN_PROPERTIES=false"/>
                </properties>
            </reader>

            <writer ref="jsonItemWriter">
                <properties>
                    <property name="resource" value="#{jobParameters['writeResource']}"/>
                    <property name="writeMode" value="overwrite"/>
                </properties>
            </writer>
        </chunk>
    </step>
</job>

excelUserModelItemWriter

<job id="ExcelWriterTest" xmlns="http://xmlns.jcp.org/xml/ns/javaee" version="1.0">
    <step id="ExcelWriterTest.step1">
        <chunk item-count="100">
            <reader ref="jsonItemReader">
                <properties>
                    <property name="resource" value="#{jobParameters['resource']}"/>
                    <property name="beanType" value="#{jobParameters['beanType']}"/>
                    <property name="start" value="#{jobParameters['start']}"/>
                    <property name="end" value="#{jobParameters['end']}"/>
                </properties>
            </reader>

            <writer ref="excelUserModelItemWriter">
                <properties>
                    <property name="resource" value="testMoviesBeanTypeFullTemplate.xlsx"/>
                    <property name="writeMode" value="overwrite"/>
                    <property name="beanType" value="org.jberet.support.io.Movie"/>
                    <property name="sheetName" value="Movies 2012"/>
                    <property name="templateResource" value="person-movies-company.xltx"/>
                    <property name="templateSheetName" value="Movies"/>
                    <property name="templateHeaderRow" value="0"/>
                </properties>
            </writer>
        </chunk>
    </step>
</job>

excelStreamingItemReader

<reader ref="excelStreamingItemReader">
    <properties>
        <property name="resource" value="person-movies.xlsx"/>
        <property name="beanType" value="org.jberet.support.io.Person"/>
        <property name="sheetName" value="Sheet1"/>
        <property name="start" value="1"/>
        <property name="headerRow" value="0"/>

        <!-- need to ignore unknown properties as some additional properties in data -->
        <property name="deserializationFeatures" value="FAIL_ON_UNKNOWN_PROPERTIES=false"/>
    </properties>
</reader>

excelStreamingItemWriter

<job id="ExcelStreamingWriterTest" xmlns="http://xmlns.jcp.org/xml/ns/javaee" version="1.0">
    <step id="ExcelStreamingWriterTest.step1">
        <chunk item-count="100">
            <reader ref="csvItemReader">
                <properties>
                    <property name="resource" value="#{jobParameters['resource']}"/>
                    <property name="headerless" value="#{jobParameters['headerless']}"/>
                    <property name="beanType" value="#{jobParameters['beanType']}"/>
                    <property name="start" value="#{jobParameters['start']}"/>
                    <property name="end" value="#{jobParameters['end']}"/>
                    <property name="nameMapping" value="#{jobParameters['nameMapping']}"/>
                    <property name="cellProcessors" value="#{jobParameters['cellProcessors']}"/>
                </properties>
            </reader>

            <writer ref="excelStreamingItemWriter">
                <properties>
                    <property name="resource" value="testMoviesBeanTypeFullStreaming.xlsx"/>
                    <property name="writeMode" value="overwrite"/>
                    <property name="beanType" value="org.jberet.support.io.Movie"/>
                    <property name="sheetName" value="Movies 2012"/>
                    <property name="serializationFeatures" value="WRITE_DATES_AS_TIMESTAMPS=false"/>
                </properties>
            </writer>
        </chunk>
    </step>
</job>

Batch Properties for All Excel Readers and Writers

resource

The resource to read from (for batch readers), or write to (for batch writers).

beanType

java.lang.Class

java.lang.String[]

Specifies the header as an ordered string array. For reader, header information must be specified with either this property or ExcelUserModelItemReader.headerRow property. This property is typically specified when there is no header row in the Excel file. For example,

"id, name, age" specifies 1st column is id, 2nd column is name and 3rd column is age.

This is a required property for writer.

sheetName

The optional name of the target sheet. When specified for a reader, it has higher precedence over ExcelUserModelItemReader.sheetIndex

additional jackson-related batch properties

All Excel readers and writers use jackson-databind for data transformation behind the scene, and this step can be configured through the following batch configuration properties in job xml, though the defaults should suffice in most cases:

  • jsonFactoryFeatures
  • mapperFeatures
  • jsonFactoryLookup
  • serializationFeatures
  • customSerializers
  • deserializationFeatures
  • customDeserializers
  • customDataTypeModules

See Chapter JsonItemReader and JsonItemWriter for more details.


Batch Properties for excelUserModelItemReader

In additional to the shared batch properties listed above, excelUserModelItemReader can also be configured through the following batch properties:

start

int

A positive integer indicating the start position in the input resource. It is optional and defaults to 0 (starting from the 1st data item). If a header row is present, the start point should be after the header row.

end

int

A positive integer indicating the end position in the input resource. It is optional and defaults to Integer.MAX_VALUE.

sheetIndex

int

The index (0-based) of the target sheet to read, defaults to 0.

headerRow

java.lang.Integer

The physical row number of the header.


Batch Properties for excelEventItemReader

All the shared batch properties for all Excel readers and writers

All the batch properties for excelUserModelItemReader

queueCapacity

int

the capacity of the queue used by org.apache.poi.hssf.eventusermodel.HSSFListener to hold pre-fetched data rows. Optional property and defaults to ExcelEventItemReader.MAX_WORKSHEET_ROWS (65536).


Batch Properties for excelStreamingItemReader

All the shared batch properties for all Excel readers and writers

All the batch properties for excelUserModelItemReader


Batch Properties for excelUserModelItemWriter

In addition to the shared batch properties for all Excel readers and writers, excelUserModelItemWriter also support the following batch properties:

writeMode

Valid writeMode for this writer class is overwrite and failIfExists.

templateResource

The resource of an existing Excel file or template file to be used as a base for generating output Excel. Its format is similar to resource batch property.

templateSheetName

The sheet name in the template file to be used for generating output Excel. If templateResource is specified but this property is not specified, templateSheetIndex is used instead.

templateSheetIndex

int

The sheet index (0-based) in the template file to be used for generating output Excel.

templateHeaderRow

java.lang.Integer

The row number (0-based) of the header in the template sheet. If header property is provided in job xml file, then this property is ignored. Otherwise, it is used to retrieve header values.


Batch Properties for excelStreamingItemWriter

All the shared batch properties for all Excel readers and writers

All the batch properties for excelUserModelItemWriter

compressTempFiles

java.lang.Boolean

Whether to compress the temp files in the course of generating Excel file, defaults to false.