Executing External Processes From Groovy – Improved

I’ve started using an exec method as shown below to run sub processes from groovy. If you run the standard exec like:

def torun = "ls -l "
proc = torun.execute()

This will hang on UNIX systems if these programs generate more STDOUT data than the buffer can handle. Better to handle this yourself with the following:

#!/usr/bin/env groovy

void execute(def command ) {

println "Running command : " + command
ProcessBuilder pb = new ProcessBuilder("/bin/sh", "-c", command)
def process = pb.start()
def output = process.inputStream
output.each() { // Throw away stdout in this example, could return it at the end.
println "Done executing command , return value : " + process.exitValue()
process = null
pb = null


execute("ls -l /usr/bin ")

NOTE From: http://jira.codehaus.org/browse/GROOVY-2620

A Better solution is :

StringWriter stringWriterOutput = new StringWriter()
StringWriter stringWriterError = new StringWriter()
Process proc = command.execute()
stringWriterOutput << proc.in
stringWriterError << proc.err

Multithreaded simple URL Crawler

Here is a quick program to create X threads that crawl a given URL given a set of results for a database.

#!/usr/bin/env groovy
import groovy.sql.Sql
// MultiThreaded query script.
// Runs a query and then submits all the jobs as threads using the Executor.newFixedThreadPool
// 3/2008 by George kowalski

// How may Threads to allow to run at one time.
def MAX_THREADS = 10

println "Processing started .. Quering Database for RGDIds ... "

def sql = Sql.newInstance("jdbc:oracle:thin:@site.edu:1521:SCHEMA", "USERID", "passwd", "oracle.jdbc.driver.OracleDriver");
def service = java.util.concurrent.Executors.newFixedThreadPool(MAX_THREADS)

def rgdIDList = []

sql.eachRow("select * from genes, rgd_ids where genes.rgd_id = rgd_ids.rgd_id and rgd_ids.object_status = 'ACTIVE' and rgd_ids.species_type_key = 3", { rgdIDList << it.rgd_id })

println "Done with Query we will be processing ${rgdIDList.size} ids "

// Class that is run.
class toRun implements Runnable {
String id
toRun(String newid) {
this.id = newid
public void run() {
println "Calling URL with id: ${id}"
def contents = new URL("http://rgddev.mcw.edu/tools/genes/genes_view.cgi?id=${id}").getText()
println "Return from: ${id}"

for (id in rgdIDList) {
println "Submitting Thread for id: ${id}"
service.execute( new toRun(id.toString()) )
// This is just to slow down display on console, not needed.

Running Oracle SQL / Call URL

The following program was whipped together to pre-cache pages on our server by hitting them with an HTTP request based on a query.

Running a SQL query and calling that URL like Curl to precache results

#!/usr/bin/env groovy
// Groovy Script to call each gene page in turn
// Requirements : Drop the Oracle jdbc14.jar driver in your $HOME/.groovy/lib directory then run this script
import groovy.sql.Sql

println "Started"
sql = Sql.newInstance("jdbc:oracle:thin:@server.com:1521:SID", "USERID", "password", "oracle.jdbc.driver.OracleDriver");

sql.eachRow("select * from genes, rgd_ids where genes.rgd_id = rgd_ids.rgd_id and rgd_ids.object_status = 'ACTIVE' and rgd_ids.species_type_key = 3", {
println "Calling: http://rgd.mcw.edu/tools/genes/genes_view.cgi?id=${it.rgd_id}"
// creates a new URL obect and downloads that HTML into the contents variable
def contents = new URL("http://rgd.mcw.edu/tools/genes/genes_view.cgi?id=${it.rgd_id}").getText()
// Uncomment next line to print contents of page
// println contents
println "Done"