Spring Boot Batch CSV Writer - Custom

Sunday Feb 16, 2020

Spring Boot’s Batch process is a very handy way to implement pipleine processing for our applications.

When we are writing a CSV file, Spring provides a FlatFileItemWriter that we can use to write a CSV file from bean fields (among other things).

However, it does not support escaping field values or quoting fields in the CSV file.

Here we will look at how to extend BeanWrapperFieldExtractor to quote fields and escape characters.

If you need to, you can see the primer on Batch here

Spring Architecture

It sounds complicated, but not really. What you do is setup a bean that converts your objects to arrays of data that get written to a file. The trick is to override the bit that reads the values from your object and modify the data coming out of the objects. We’ll see below it is not that difficult

1. First, we should note that Spring provides a `FlatFileItemWriterBuilder` to help setup a
`FlatFileItemWriter`. 

2. Next, the `FlatFileItemWriter` uses a `LineAggregator` (or in our case a `DelimetedLineAggregator`) to
write a delimited line of data.

3. This aggregator uses a `BeanWrapperFieldExtractor` to get the field values of the object and return an 
array of objects that correspond to the CSV fields.

Field Extractor

One easy way to implement this is to extend the default BeanWrapperFieldExtractor and further modify the field data after it is extracted. To do that, we take the extracted values and iterate over them to:

1. If it's a String, surround it with our quote character
2. Do something with quote characters in the data.

The first is easy, and the second (in this case) makes sense to just substitute ' for each ".

Here is the code for that:

class QuotingBeanWrapperFieldExtractor<T>(private val quote: String = "\"", private val replaceWith: String = "'") :
        BeanWrapperFieldExtractor<T>() {
    override fun extract(item: T): Array<Any> {
        var items = super.extract(item)
        var quotedItems = ArrayList<Any>()

        items.forEach {
            var stringValue = it?.toString() ?: ""
            quotedItems.add("${quote}${stringValue.replace(quote, replaceWith)}${quote}")
        }
        return quotedItems.toArray()
    }
}

CSV Writer Configuration

With our modified extractor, we simply have to configure a CSV writer to use it and we are all set.

@Bean
@Qualifier("concept-csv-writer")
fun conceptCsvWriter(@Value("\${com.ibm.cio.ies.exportnotesdata.batch.concept-csv-output-file}:/tmp/test.csv")
                     csvOutputFile: String): ItemWriter<IESConcept> {
    val lineAggregator = DelimitedLineAggregator<IESConcept>()
    lineAggregator.setFieldExtractor(iesBeanWrapperFieldExtractor())
    val writer = FlatFileItemWriterBuilder<IESConcept>()
            .resource(FileSystemResource(csvOutputFile))
            .lineAggregator(lineAggregator)
            .name("Concept CSV Writer")
            .headerCallback(
                    object : FlatFileHeaderCallback {
                        override fun writeHeader(writer: Writer) {
                            writer.write(conceptFieldArray.joinToString())
                        }
                    }
            )
    return writer.build()
}

Batch Configuration

Now, we’ll setup a simple “dummy” reader:

class DataReader : ItemReader<Store> {
    val stores = arrayOf(Store("Test1", "This is a test"), Store("Test2", "This is \n another test"))
    var counter = 0;
    override fun read(): Store? = if(counter >= stores.size) null else stores.get(counter++)
}

With that done, here is the entirety of the batch configuration:

    fun getOutputFileResource(): Resource =
            FileSystemResource("${System.getProperty("java.io.tmpdir")}${File.separator}spring-csv-test.csv")

    fun getCSVQuotingBeanWrapperFieldExtractor(): BeanWrapperFieldExtractor<Store> {
        val extractor: CSVQuotingBeanWrapperFieldExtractor<Store> = CSVQuotingBeanWrapperFieldExtractor()
        extractor.setNames(STORE_FIELDS)
        return extractor;
    }

    fun getStoreCSVWriter() : ItemWriter<Store> {
        val resource = getOutputFileResource()
        logger.info("Results will be written to ${resource}")

        val lineAggregator = DelimitedLineAggregator<Store>()
        lineAggregator.setFieldExtractor(getCSVQuotingBeanWrapperFieldExtractor())

        val writer = FlatFileItemWriterBuilder<Store>()
                .resource(getOutputFileResource())
                .lineAggregator(lineAggregator)
                .name("Store CSV Writer")
                .headerCallback(object: FlatFileHeaderCallback {
                    override fun writeHeader(writer: Writer) {
                        writer.write(STORE_FIELDS.joinToString(","))
                    }
                }).build()
        return writer
    }

    @Bean(name = ["Batch example 1"])
    fun jobLogAllFilteredTextItems(jbf: JobBuilderFactory, sbf: StepBuilderFactory): Job {
        val compositeItemWriter = CompositeItemWriter<Store>()
        compositeItemWriter.setDelegates(listOf(DataWriter(), getStoreCSVWriter()))

        return jbf.get("demo-text-job1").start(
                sbf.get("demo-text-step1")
                        .chunk<Store, Store>(1)
                        .reader(DataReader())
                        .writer(compositeItemWriter)
                        .build()).build()
    }
    

After we run our application, we can get our temp file from the log output and check it out. Let’s cat our test file and see that we have surrounded the fields with " and are all set:

 cat /var/folders/hh/clds_1sx6hq33k_1cqynj_fw0000gn/T/spring-csv-test.csv

name,description
"Test1","This is a test"
"Test2","This is 
 another test"

This is it, the first line is good and the second has a quoted value with a newline!

Conclusion

It’s pretty simple to modify most Spring processes, and CSV writing is no exception. Here, with minimal code, we were able to create a quoted CSV writer with Spring Boot’s Spring Batch components.