Develop Batch Artifacts in Script Languages
Whether it's for data ETL or quick testing, script language offers a valuable alternative to Java in developing batch applications. JBeret supports writing Batchlet, ItemReader, ItemProcessor and ItemWriter in popular script languages. A job xml may reference an external script resource, or directly include script as CDATA or PCDATA.
JBeret relies on JSR-223 Scripting for the Java™ Platform (available in Java SE) standard API to run script batch artifacts. Each script language used by a batch application requires its JSR-223-compliant script engine as runtime dependency. The following table lists some common script languages and script engines:
| Script Language | Script Engine | Obtain from | Impl javax.script.Invocable? | Suitable for |
|---|---|---|---|---|
| JavaScript | Mozilla Rhino | included in Oracle JDK 5, 6 & 7 | Yes | reader, processor, writer & batchlet |
| JavaScript | Oracle Nashorn | included in Oracle JDK 8 | Yes | reader, processor, writer & batchlet |
| Groovy | org.codehaus.groovy:groovy-jsr223 | Maven Central | Yes | reader, processor, writer & batchlet |
| Jython / Python | org.python:jython | Maven Central | Yes | reader, processor, writer & batchlet |
| JRuby / Ruby | org.jruby:jruby | Maven Central | Yes | reader, processor, writer & batchlet |
| Scala | org.scala-lang:scala-compiler | Maven Central | No | processor & batchlet |
| PHP | com.caucho:resin-quercus | caucho-repository http://caucho.com/m2/ | No | processor & batchlet |
| R (Renjin) | org.renjin:renjin-script-engine | http://nexus.bedatadriven.com/content/groups/public | Yes | processor & batchlet |
The following XML snippet includes all the above script engine dependencies. A batch application should only include what is really needed at runtime.
<dependencies>
<dependency>
<groupId>org.codehaus.groovy</groupId>
<artifactId>groovy-jsr223</artifactId>
</dependency>
<dependency>
<groupId>org.codehaus.groovy</groupId>
<artifactId>groovy</artifactId>
</dependency>
<dependency>
<groupId>org.jruby</groupId>
<artifactId>jruby</artifactId>
</dependency>
<dependency>
<groupId>org.python</groupId>
<artifactId>jython</artifactId>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-compiler</artifactId>
</dependency>
<dependency>
<groupId>com.caucho</groupId>
<artifactId>resin-quercus</artifactId>
</dependency>
<dependency>
<groupId>org.renjin</groupId>
<artifactId>renjin-script-engine</artifactId>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</dependency>
</dependencies>
<repositories>
<repository>
<id>caucho-repository</id>
<url>http://caucho.com/m2/</url>
</repository>
<repository>
<id>bedatadriven</id>
<name>bedatadriven public repo</name>
<url>http://nexus.bedatadriven.com/content/groups/public/</url>
</repository>
</repositories>
Write Batchlet, ItemReader, ItemProcessor or ItemWriter in Scripts
Batch artifacts written in script languages must follow the following rules:
Rules for
ItemReaderscript:- The script engine must implement
javax.script.Invocableso JBeret can invoke various methods defined inItemReaderinterface. - An
ItemReaderscript must implementreadItemfunction, or another function mapped toreadItemmethod. Other methods fromItemReaderinterface (open,close&checkpointInfo) may be optionally implemented.
- The script engine must implement
Rules for
ItemProcessorscript:- If the script engine implements
javax.script.Invocable, and theItemProcessorscript implementsprocessItemfunction, or another function mapped toprocessItemmethod, that function will be invoked. - Otherwise, the script content is evaluated to fulfill
processItemmethod.
- If the script engine implements
Rules for
ItemWriterscript:- The script engine must implement
javax.script.Invocableso JBeret can invoke various methods defined inItemWriterinterface. - An
ItemWriterscript must implementwriteItemsfunction, or another function mapped towriteItemsmethod. Other methods fromItemWriterinterface (open,close&checkpointInfo) may be optionally implemented.
- The script engine must implement
Rules for
Batchletscript:processmethod requirement:- If the script engine implements
javax.script.Invocable, and theBatchletscript implementsprocessfunction, or another function mapped toprocessmethod, that function will be invoked to fulfillprocessmethod. - Otherwise, the script content is evaluated to fulfill
processmethod.
- If the script engine implements
stopmethod requirement:- If the script engine implements
javax.script.Invocable, and theBatchletscript implementsstopfunction, or another function mapped tostopmethod, that function will be invoked to fulfillstopmethod. - Otherwise, nothing is done.
- If the script engine implements
Method-to-function Mapping
By default, script function names are the same as method names in batch API interfaces. In certain cases, custom names may be needed to avoid naming conflict, or to follow different naming convention. This can be achieved with methodMapping property under <reader>, <processor>, <writer>, or <batchlet> element in job xml.
The property value is a comma-separated list of key-value pairs, with the key as method name in batch API interfaces, and value as the function name in script.
<property name="methodMapping" value="open=openBatchReader, close=closeBatchReader"/>
Note that batch API method names like open or close may be reserved words or built-in functioins in certain script languages. If so, they must be mapped to different function names.
Access Batch Objects from Script
JBeret exposes the following batch objects to the script in the scope of javax.script.ScriptContext#ENGINE_SCOPE, so the script application can access information and interact with batch runtime.
| Name | Description | Type | Example (syntax varies) |
|---|---|---|---|
jobContext |
JobContext of the current job execution |
javax.batch.runtime.context.JobContext |
jobContext.getJobName(), jobContext.setExitStatus('xxx') |
stepContext |
StepContext of the current step execution |
javax.batch.runtime.context.StepContext |
stepContext.getStepName() |
batchProperties |
properties specified in job xml under the current batch artifact | java.util.Properties |
batchProperties.get('testName') |
Configure Script in Job XML
Script may be configured in job xml either as a direct sub-element, or as an external reference. To achieve that, a <script> element is added to the standard schema, as a sub-element of <batchlet>, <reader>, <processor>, or <writer>. <script> element appears after any <properties> element for the same artifact definition.
<script> has 2 attributes:
| Attribute Name | Description | Required? | Examples |
|---|---|---|---|
| type | Identify the script language being used. Its value should be recognizable by the underlying script engine, either as a MIME-type, or an engine name. Some typical values are: javascript, groovy, jython, php, jruby, scala. | required for inline script, and optional for external script if its file extension can already identify the script language | type = "javascript" |
| src | resource path, file path or URL to the external script | required for external script, and must not be used for inline script | src = "javascript/item-reader.js" |
If <script> element is present, ref attribute of the same batchlet , <reader>, <processor>, or <writer>element must not be specified, because the script already defines the artifact and there is no need for artifact ref name.
Inline Script in Job XML
<job id="batchletJavascriptInline" xmlns="http://xmlns.jcp.org/xml/ns/javaee" version="1.0">
<step id="batchletJavascriptInline.step1">
<batchlet>
<properties>
<property name="testName" value="#{jobParameters['testName']}"/>
</properties>
<script type="javascript">
<![CDATA[
function stop() {
print('In stop function\n');
}
//access built-in variables: jobContext, stepContext and batchProperties,
//set job exit status to the value of testName property, and
//return the value of testName property as step exit status,
//
function process() {
print('jobName: ' + jobContext.getJobName() + '\n');
print('stepName: ' + stepContext.getStepName() + '\n');
var testName = batchProperties.get('testName');
jobContext.setExitStatus(testName);
return testName;
}
]]>
</script>
</batchlet>
</step>
</job>
The script content may be CDATA or element text, and CDATA should be the preferred method to avoid issues with special characters.
Note that in certian script languages (e.g., Python), indentation is significant. So any inline script content should be indented from the left-most column, regardless of XML indentation.
Inline script is convenient for short, or even no-op batch artifact, for example,
<writer>
<script type="javascript">
function writeItems(items) {}
</script>
</writer>
External Script Resource Referenced by Job XML
To reference an external script file or resource, simply specify src attribute, and optionally type attribute.
<job id="batchletGroovySrc" xmlns="http://xmlns.jcp.org/xml/ns/javaee" version="1.0">
<step id="batchletGroovySrc.step1">
<batchlet>
<properties>
<property name="testName" value="#{jobParameters['testName']}"/>
</properties>
<script src="groovy/simple-batchlet.groovy"/>
Examples
External Jython Reader, Inline Processor and Writer:
<job id="chunkPython" xmlns="http://xmlns.jcp.org/xml/ns/javaee" version="1.0">
<step id="chunkPython.step1">
<chunk item-count="3">
<reader>
<properties>
<property name="resource" value="#{systemProperties['java.io.tmpdir']}/numbers.csv"/>
<!-- open and close are built-in functions in Python, so need to map batch API's open & close
methods to some other names to avoid overriding python built-in functions.
The following methodMapping property maps batch API's open method to openBatch function in the
src script.
-->
<property name="methodMapping" value="open=openBatchReader, close=closeBatchReader"/>
</properties>
<script type="python" src="python/item-reader.py"/></script>
</processor>
<writer>
<properties>
<property name="methodMapping" value="open=openBatchWriter, close=closeBatchWriter"/>
</properties>
<script type="python"></script>
</writer>
</chunk>
<end on="*" exit-status="chunkPython"/>
</step>
</job>
item-reader.py:
rows = []
position = 0
def openBatchReader(checkpoint):
global rows
resource = batchProperties.get("resource")
f = open(resource, 'rb')
try:
for row in f.readlines():
columnValues = row.split(",")
rows.append(columnValues)
finally:
f.close()
if (checkpoint is None):
position = checkpoint
def checkpointInfo():
return position
def readItem():
global position
if (position >= len(rows)):
return None
item = rows[position]
position += 1;
return item;
External Groovy Reader, Inline Processor and Writer:
<job id="chunkGroovy" xmlns="http://xmlns.jcp.org/xml/ns/javaee" version="1.0">
<step id="chunkGroovy.step1">
<chunk item-count="3">
<reader>
<properties>
<property name="resource" value="numbers.csv"/>
</properties>
<script type="groovy" src="groovy/ItemReader.groovy"/></script>
</processor>
<writer>
<script type="groovy"></script>
</writer>
</chunk>
<end on="*" exit-status="chunkGroovy"/>
</step>
</job>
ItemReader.groovy
package groovy
import groovy.transform.Field
@Field List<String[]> rows;
@Field int position = 0;
def open(checkpoint) {
String resourcePath = batchProperties.get("resource");
InputStream inputFile = this.class.getClassLoader().getResourceAsStream(resourcePath);
String[] lines = inputFile.text.split('\n');
rows = lines.collect { it.split(',') };
inputFile.close();
if (checkpoint != null) {
position = checkpoint;
}
println("ItemReader.groovy open, rows: " + rows);
}
def checkpointInfo() {
return position;
}
def readItem() {
if (position >= rows.size()) {
return null;
}
return rows.get(position++);
}
External Ruby Reader, Inline Processor and Writer:
<job id="chunkRuby" xmlns="http://xmlns.jcp.org/xml/ns/javaee" version="1.0">
<step id="chunkRuby.step1">
<chunk item-count="3">
<reader>
<properties>
<property name="resource" value="#{systemProperties['java.io.tmpdir']}/numbers.csv"/>
<!-- open and close are built-in functions in Ruby, so need to map batch API's open & close
methods to some other names to avoid overriding ruby built-in functions.
The following methodMapping property maps batch API's open method to openBatch function in the
src script.
-->
<property name="methodMapping" value="open=openBatchReader, close=closeBatchReader"/>
</properties>
<script type="jruby" src="ruby/item-reader.rb"/></script>
</processor>
<writer>
<properties>
<property name="methodMapping" value="open=openBatchWriter, close=closeBatchWriter"/>
</properties>
<script type="jruby"></script>
</writer>
</chunk>
<end on="*" exit-status="chunkRuby"/>
</step>
</job>
item-reader.rb
require 'csv'
$rows = []
$position = 0
def openBatchReader(checkpoint)
resource = $batchProperties.get("resource")
puts(resource)
$rows = CSV.read(resource)
if checkpoint != nil
$position = checkpoint
end
puts($rows)
end
def closeBatchReader()
puts('In reader close')
end
def checkpointInfo()
return $position
end
def readItem()
if $position >= $rows.length
return nil
end
item = $rows[$position]
$position += 1
return item
end
Inline Scala Batchlet:
<job id="batchletScalaInline" xmlns="http://xmlns.jcp.org/xml/ns/javaee" version="1.0">
<step id="batchletScalaInline.step1">
<batchlet>
<properties>
<property name="testName" value="#{jobParameters['testName']}"/>
</properties>
<script type="scala">
import java.util.Properties
import javax.batch.runtime.context.{StepContext, JobContext}
val jobContext1 = jobContext.asInstanceOf[JobContext]
val stepContext1 = stepContext.asInstanceOf[StepContext]
val batchProperties1 = batchProperties.asInstanceOf[Properties]
println("jobName: " + jobContext1.getJobName())
println("stepName: " + stepContext1.getStepName())
val testName : String = batchProperties1.get("testName").asInstanceOf[String]
jobContext1.setExitStatus(testName)
return testName;
</script>
</batchlet>
</step>
</job>
Inline R (Renjin) Batchlet:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE job [
<!ENTITY batchlet-properties-segment SYSTEM "batchlet-properties-segment.xml">
]>
<job id="batchletRInline" xmlns="http://xmlns.jcp.org/xml/ns/javaee" version="1.0">
<step id="batchletRInline.step1">
<batchlet>
&batchlet-properties-segment;
<script type="Renjin"></script>
</batchlet>
</step>
</job>