Easy Method Scheduler with Spring 3.x

Scheduling by Annotation

The Java community is wide, and sometimes its easy to miss something simple. But always “google it first”, if you are doing something and it seems harder than it should, you’re probably doing it wrong. So today, lets go ahead an schedule a re-occurring process in a Jersey REST service.

Use Case: Updating cached values every (N) period

Say you have a website that needs to be responsive, but it needs to “call home” every-now-and-then to make sure it’s running with the latest settings and data. In effect you want an expiring cache that should expire every (N) period (TTL or Total Time To Live). With java and spring this is super easy.


We need a public method that does not return a value and we need to define the period. Spring has many options (http://docs.spring.io/spring/docs/3.0.x/spring-framework-reference/html/scheduling.html) for defining the period; from cron, to fixed interval, etc. In this instance I am choosing fixedRate which will allow the method to execute and it will not restart the timer till the method completes (so if the refresh operation takes awhile we wont overlap executions).

private static int TEN_MINUTES = 1000 * 60 * 10;
@Scheduled(fixedRate = TEN_MINUTES)
public void refresh() {
   // ... do something cool to update your cache/data


Now we add our spring information and, low-and-behold, we’ll have automated polling! It really is that easy!

<beans xmlns="http://www.springframework.org/schema/beans"
    xsi:schemaLocation="http://www.springframework.org/schema/task http://www.springframework.org/schema/task/spring-task-3.0.xsd">

    <task:annotation-driven executor="myExecutor" scheduler="myScheduler"/>
    <task:executor id="myExecutor" pool-size="5"/>
    <task:scheduler id="myScheduler" pool-size="10"/>


User Based Personalization Engine with Solr


You have a varied collection of media you want to personalize. It could be links, websites, friends, animals, recipes, videos, etc. The content has meta-data attributes that are’t very clean. Sometimes things like brand name are abbreviated, categories are pluralized, different casing, etc. Media also has destination attributes, maybe it’s only applicable for Florida residents, or people who live in a 30 miles radius of Tampa. Or in a more complex example; maybe within a predefined market zone (custom geo polygon representing a sales territory or market). Also media can be in different formats, web, email, video, kiosk, etc.


You need to personalize this content given the following dimensions:

  • The meta-data about the media such as brand, flavor, category, feature, price, value, gtin, etc
  • The current market information such as clicks, impressions, purchases, views, avg spend, etc
  • Standard algorithims such as frequency, recency, segmentation based on an individual user


  • As a personalization engine I want to be able to get all media applicable to a particular user, ranked sorted by it’s score.
  • As a personalization engine I want to be able to return relevant media that takes into account endpoint information such as lat/long, banner, etc so that I dont recommend media that doesn’t fit the channel, store, sales objective.

Custom Implementation

Say you built this buy hand, what do you need at minimum…

  • Code/Db/Schema to store documents representing media
  • Code for APIs around fetching/filtering/pagination of those documents
  • Code to import those documents
  • Scale-out of the scoring computation
  • Media filter semantics (for example, score documents with the word ‘RED BULL’ in it)
  • Auto-suggest/did-you-mean (useful to show related media or near duplicates)
  • Built in geo/location filter semantics (for example, score documents in a 5km radius of Tampa)
  • Flexible document model to allow scoring many different types of media with varying quality of meta-data
  • Ability to score based upon any number of analytic models both on media and user

All these things can be done and for many that might be fun way to learn and grow. And for some companies it might be the best approach, all things considered. However, today we won’t talk about building a custom personalization engine, today we’ll explore what it would look like if you leveraged SOLR for this.

Personalization of Media Using Solr

Today we’ll take a stab at user-based personalization in SOLR. Why? Because it solves for most of the above, has built in cloud scale, other people have done it, it has a mature API used by large companies, and has baked in administration functions, and more. So how do we get started? First, some references and blogs about what we are trying to do.


Solr Building Blocks

Media Ingestion

So to store media, we already have that in SOLR care of it’s built in support for XML, CSV, JSON, JDBC, and vast array of other formats. For ease, we can just post documents to solr or using a JDBC endpoint. Connecting SOLR with Hadoop is an easy task as well care of the Hive JDBC driver so regardless of where the media is, it can be pushed or pulled with ease.

Basic Network Filtering

To filter media by basic things like “only media for this advertiser” we can just use out-of-the-box solr queries. So if our media has “advertiser_id” as an attribute we can simply do “/media/select?q=advertiser_id:UTC12341234”. Solr is great at this. Further if we want to only get media by a particular site or network we can just decorate those tags in the media and we’ll be able to slice and dice media. Typically these “filters” are synonymous with “business rules” so we can also let external parties pass us this information, and we can avoid having to be concerned with these details (which is great not to have to worry about it or create custom APIs).

Geo/Location Filtering

SOLR has a wealth of geo/location filtering abilities, from bounding boxes, to radius, to custom polygon shapes. Media that has attributes like lat/long can be searched for, and if a user is in a particular area we can find relevant deals within (N) km of their current location. Really powerful stuff when combined with market zones!

Media Management

Since all media ends up in SOLR we can use native search functionality to manage and monitor the media. Faceted search to power top(N) media, get insights into overlapping media, duplicates, and fuzzy matching allow us to see all the media at a glance and browse/pivot it to however a business user feels they need to. Out-of-the-box UX experiences can be used, or downloaded to drive this (hue/solr).

Generic Relevancy Algorithms

SOLR comes with some fairly nice relevancy Solr Relevency FAQ. Note it already has built in functions for scoring relevancy on basic audience information like clicks/popularity. So you could probably stop here if you just wanted to, let’s say, score media by overall clicks in the past hour. In-fact linkedIn and other use this and there is a nice power-point deck here on Implementing Click Through Relevance Ranking

Domain Specific Relevancy

So we are 99% there, but lets say we need to tailor the scores and have finer more mathematical control over scoring. We can do this by implementing domain specific language concepts into SOLR. It’s already got the plug-and-play semantics for this so that we can in real-time mash a users preferences/behavior data/segmentation information with each piece of media to compute a score or many scores. Its opened up to us by implementing Solr Function Queries. Solr already has many out of the box, the only piece that is missing is being able to get your user-information mixed with your media.

And because solr has built in support for this we can filter, sort, and return these mathematical gems to build up an expressive library of domain specific functions.

Example: Recency Function

Let’s start with a basic example, we want to compute the recency in days since the last click on a particular category. We need to be able to tell our function “who” so it can lookup the users information (from a database, API, etc) and we also need to tell it “what” we want to score on.


In this instance “myvalue” is the value returned back to us (we aren’t sorting yet or filtering). “click_recency” is our custom function. “USERID” is the user identifier which will be used to look-up whatever information you have about the consumer category clicks, and finally “category” is the field name in the SOLR media index to weight against.

Assume we have a document index as follows:

media #   |  category   |  .... |  .... | ...
1         |  AUTO       |  .... |  .... | ...
2         |  PET        |  .... |  .... | ...
3         |  ANIME      |  .... |  .... | ...

Assume we have access to an API that will return information about a particular user (maybe from a database, nosql, or some paid-provider like a DMP).

    "id": "1234",
    "clicks": {
       "category": {
          "CLOTHES": 10,
          "SOFTWARE": 9,
          "PET": 9

In our simple example, our user model doesn’t contain anything like when clicks were made, etc. Just aggregates, but depending on the richness of your user model you could certainly create computations that take into account frequency, recency, etc.


public class MyUserFunction extends ValueSourceParser { 
    private MyUserService users;
    // called when our function is initialized so we can
    // configure it by telling it where external sources are
    // or maybe how much/long to cache, or the uid/pwd to access, etc...
    public void init(NamedList configuration) {
       // get your user information api endpoint here
       String api_endpoint = configuration.get("url").toString(); 
       users = new MyUserService(api_endpoint, ..., ...);

    public ValueSource parse(FunctionQParser fp) throws SyntaxError {
       String user_id = fp.parseArg();
       String field_to_eval = fp.parseArg();
       User user = users.get(user_id);
       return new UserRecencyValueSource(user, field_to_eval);


public class UserRecencyValueSource extends ValueSource {
   public UserRecencyValueSource(User user, String field_to_eval) {
     // ...

   public FunctionValues getValues(Map context, AtomicReaderContext reader) throws IOException {
      // calculate hash of users recency
      HashSet<String, Double> clicks_by_category = // ...;
      return new DocTermsIndexDocValues(this, reader, field_to_eval) {
         public Object objectVal(int doc) {
             String field_value = strVal(doc);
             // has this person clicked on it, if not just return 0
             if (!clicks_by_category.containsKey(field_value)) return 0D;
             return clicks_by_category.get(field_value);


To enable our component we update the solrconfig and register our user function.

  <valueSourceParser name="click_recency" class="com.foo.bar.MyUserFunction">
    <str name="userinfo.url">http://..../users</str>
    <str name="userinfo.uid">test</str>
    <str name="userinfo.pwd">test</str>     

Now with this in place, we can sort/filter/etc by our custom component and because it’s implemented in a standard way we can also combine it with all the other SOLR functions and any other solr functions you might have in your library. So…

&sort=log(max(click_recency(tsnyder,'category'), click_recency(tsnyder,'brand'))) desc

So in the above hypothetical we sort all documents by the log base 10 of the maximum value returned by either category click recency or brand click recency in descending order. Now imagine your library grew to contain frequency, spend, quantity, and more. Considering SOLR also has functions such as scale, cos, tan, etc we can now create a very flexible manner of scoring documents in a possibly infinite number of ways.

Final Thoughts

If you are still questioning how powerful this concept is, go check out Solr Image Search and a live demo of image search in solr which uses SOLR Query Functions to find patterns within images and return similar images.


AWS CLI and JQ For Automation of Environments

So I’ve had a problem recently, it might be familiar to others.

I constantly need to provision a brand new environment and I always run into a snag. Basically, vagrant keeps state in a folder under .vagrant and for the vagrant cloud plugins this has the AWS instance id it THINKS it should provision to, if the folder doesn’t exist, then a new instance is created.

The problem comes in when I might want to shutdown an instance in the ec2 console, the ec2 instance itself blows up, or maybe I want to provision a new environment. Any of these actions causes the jenkins state to go out of sync with the real world, because the instance ID in the .vagrant folder no longer matches reality – and when this happens we lose all ability to provision or re-provision.

I’ve been solving this by wiping out my workspace every time this happens to provision a brand new environment, it’s not that bad, except we have clusters of machines that require this wipe-out and re-do. Also we waste time pulling down for git stuff that hasn’t actually changed. And finally, worst of all it’s manual and that involves the whole tribal knowledge thing…. yuck..

Consider more complex environments when we end up running multiple jenkins instances, one jenkins might have the ./vagrant folder and another doesn’t. Or maybe the worst happens, and our jenkins box gets knocked out. Without this state it would cause us to bring up multiple instances (both expensive, and possibly leading to errors). So what to do?

With a little bash voodoo we can scan amazon for instances using amazon cli by looking for instances already in ‘pending’ or ‘active’ state! And then we can then let vagrant know what really is the state of the world.

We can “find” our instances by issuing the below command using the excellent aws cli along with JQ.

aws ec2 describe-tags --filters Name=resource-type,Values=instance | \
         jq '.Tags[] | {Key,Value,ResourceId}' | \
         jq '. | select(.Key=="Name")' | \ 
         jq '. | select(.Value=="YOURNAMEHERE").ResourceId'

This little gem will return either null or the instance ID of the machine you are looking for (our instances are all uniquely named). So now we can use this to conditionally run either a vagrant up YOURNAMEHERE or vagrant provision YOURNAMEHERE depending on the result.

The trick to getting provisioning to work from scratch (lets say you configure jenkins to reset your git every build) is to create the correct file in .vagrant/machines/YOURNAMEHERE/aws/id when the above yields an instance id that is active.


cd puppet
chmod +x *.sh
cd "boxes/$NAME"

  NODE_ID=$(printf "%02d" $ID)
  echo "$EC2_TAG_NAME"

  rm -rf ".vagrant/machines/$NAME/aws" || true
  mkdir -p ".vagrant/machines/$NAME/aws"

  INSTANCE_ID=$(aws ec2 describe-instances --filters Name=instance-state-name,Values=running,pending | jq '.Reservations[]' | jq '.Instances[] | { InstanceId, Tags }' | jq 'select(has("Tags")) | select(.Tags[]["Key"] == "Name" and .Tags[]["Value"] == "'"$EC2_TAG_NAME"'") | .InstanceId')
  INSTANCE_ID=$(echo $INSTANCE_ID | sed "s/\"//g")

  if [ -n "$INSTANCE_ID" ]; then
    echo "=================== PROVISION ENVIRONMENT ======================="
    echo $INSTANCE_ID > ".vagrant/machines/$VAGRANT_NAME/aws/id"
    test 0 -eq `echo $DEPLOYMENT_OUTPUT | grep "VM not created" | wc -l` -a 0 -eq $?
    echo "=================== BRAND NEW ENVIRONMENT ======================="
    DEPLOYMENT_OUTPUT=`ENV=$ENV NODE=$NODE_ID vagrant up $VAGRANT_NAME --provider=aws`
    test 0 -eq `echo $DEPLOYMENT_OUTPUT | grep "VM not created" | wc -l` -a 0 -eq $?


Now we can provision/re-provision and also use the standard amazon control panel – so we can blow away instances in the ec2 console, and then on next jenkins push, it will detect no instances, and will automatically provision the new environment.

Important to note, that the above allows us to spindle up (N) number of vagrant instances. Useful, for example, when node 09 was terminated and when we run this the script will provision 01-08 and bring up a new 09.

devops, Puppet

Scaling Development with Vagrant/Puppet and Ubuntu Desktop

Use Case

The general use-case for development with a team or even a single individual is repeatable and scalable work. With home and work blurring I often find myself working at home and in the office and a need a way to quickly get a development environment up and running. Given that I work mostly in the Linux realm we have found it quite effective to use vagrant and puppet when running dev-ops. However, this has been limited to server environments where we want to provision new environments for running a cluster.

That being said, I work mostly in Windows still, and Hadoop doesn’t have quite the same level of support for unit-testing and quality support (HDFS for example is difficult). So I’ve opted to try configuring a developer environment with Ubuntu Desktop Precise (12.x).


Git Branching Model


  • Vagrant (latest stable)
  • VirtualBox (latest stable)
  • ISO for Ubuntu Desktop Precise (12.x)


Configure the launch bar for eclipse create the below file, chmod +x, and drag to sidebar launcher.


[Desktop Entry]
Comment=Eclipse Integrated Development Environment

Configure your system with a new font (source code pro).



mkdir /tmp/adodefont
cd /tmp/adodefont
wget ${URL} -O ${FONT_NAME}.zip
unzip -o -j ${FONT_NAME}.zip
mkdir -p ~/.fonts
cp *.otf ~/.fonts
fc-cache -f -v

Java, Misc

Standard Maven : Multi-Module Maven Projects and Parent POMs


So you have a platform you are trying to make – most of it written using Java, maybe a little nodejs, zeromq or some other such goodies but you want to manage your java resources using as much out-of-the-box engineering to reduce overall boilerplate and headaches that come with a compiled language. You decide to go with maven because, well it’s better than JAR hell, and has more cool kids using it (github, etc). Tooling support is more than decent and you’ve got support by most if not all Apache projects for dependencies.


  • The system should be able to build the entire project and all dependencies in one go.
  • The system should be able to load the entire solution into an IDE (Eclipse)
  • The system should follow DRY and KISS – things should be in one place, and things should only do one thing and do it well
  • The system should be able to create a pit of success (proper documentation, unit tests, code coverage, reports, site creation, etc)


The example solution structure below is the same as that is used by Hibernate and some top level Apache projects.

 ./pom.xml            (solution level pom, glue for which modules are part of solution)
 ./modules            (all modules for the solution)
   ./core             (common shared lib, usually domain objects, etc)
     ./pom.xml        (standard maven module pom with ref to parent)
   ./api              (api for exposing jersey, jax-ws, jax-b services)
     ./pom.xml        (standard maven module pom with ref to parent)
   ./parent           (parent pom container)
     ./pom.xml        (standard parent pom)


Solution POM

The solution POM simply exists to setup the list of modules. Used primarily by build tools as well as IDE's this POM contains a listing of all modules that make up this solution. This combined with default properties and/or build profiles, default group id, packaging, etc.

Parent POM

Contains all common settings, build plugins, common libraries, frameworks, test tools, etc. Normally this includes all standards like custom repositories, which java version to target. Also common build dependencies such as junit, logging frameworks.

Finally the parent pom usually specifies the site generators, javadoc options, and team members, etc for site generation.

Module POM(s)

Specific to each logical component in the overall app/solution architecture. Normally separated out in classic N-Tier patterns - domain object module, business rule module, data module, api module, webapp module, etc.

Continuous Delivery

  • Enabled by applying top level code quality checks in the parent POM (checkstyle, PMD, etc)
  • Enabled by applying top level doc generation in the parent POM (JavaDocs, maven sites, etc)

Self Assembling Nodes for Elastic Compute Resource Resolution

So lets set this up, as a simple problem statement:

You have an elastic compute grid, which allows you to add/remove resources at will. In general, as you add capacity by adding things like MySQL instances, memcache, redis, etc resource you want to let your various tiers get notified of these additions/subtractions such that as you add resources your applications can sort of just “hang out” and wait for the resources they need to become available. You dont want to have to manually configure IPs, Ports, DNS, etc. Simply “hot-swap” these resources in and feel assured that (N) or more clients can find available services out there in the wild.

SOA had the concept of a bus and discovery, which usually goes something like either (A) use a single registry to perform lookup/routing or (B) use UDP broadcasts to let your applications “self-discover” their environment.

I am now a huge fan of puppet and nodejs along with zeromq and as such I believe the solution is made much simplier thanks to epgm/pgm protocols.

Lets take a look at same nodejs code…

Infrastructure subscriber

So with those lines of code you have now rest assured that when infrastructure is added with the specified tag (can be any string) that you will be able to pick it up and do something with it (IE: configure it).

Infrastructure publisher

Given this simple and effective setup as you add nodes to your Amazon EC2, rackspace, whatever instance provided you are within BROADCAST range you should be able to pick up instances dynamically, perform an operation, and then wait for those instances to disappear or go-offline, and re-act accordingly.

Use Case

As a user I have the option of hosting many SOLR instances for a BIG data solution. One of the problems with managing a large number of SOLR instances is the complexity involved in infrastructure. In order to create shards and distributed queries I need to update my solr.config files with all solr instances that are available to ensure my queries cover the entire spectrum. However, as I add or remove nodes in my cloud I want this configuration to stay in-sync. Normally this requires automation but I dont want to have a human typing in IP addresses or requiring a complex management structure to handle the complexity. What I want is the ability to drop in solr instances and pluck them out and have it elastically adjust and compensate.

How the solution maps

With the proposed solution we now have a means such that as SOLR instances get added we can invoke a puppet process to update our SOLR configuration (using FACTER). The moment a change is detected we update FACTER with the latest information needed, and then execute PUPPET, which will then do the required re-configuration/uninstallation/etc tasks.

But what about zookeeper?

The latest SOLR trunk (4.0) includes zookeeper for this kind of thing, but this introduces complexity as we need leader election, fail-over for zookeeper farm, and have a central point of contention. While this may be good for some instances, in other instances where the farm itself could be upwards of 100 servers (100’s of billions of SOLR docs) we want to handle outages gracefully.

Using this approach we can also have our instances geographically dispersed by creating a “forwarder” that can listen to a separate network and receive the same pub/sub updates.

Looking forward to open-sourcing this to github as time goes on.


Tomcat 6 & 7 – Best Practices


This guide shows you how to setup and run your Tomcat instances in a multi-tenant environment. It also shows you how to effectively design your provisioning, monitoring, and security model around Tomcat in a flexible manner.

The information in this guide is based on practices learned from customer feedback and product support, as well as experience from the field and in the trenches. The guidance is task-based and presented in the following parts.

  • Part I – Tomcat Overview – gives you quick overview of Tomcat as provided out-of-the-box by Apache, how it launches, and how it can be configured.
  • Part II – Security – gives you a quick overview of fundamental security concepts as they relate to Tomcat.
  • Part III – Configuration – gives you a quick overview of fundamental configuration options and concepts as they relate to performance, management, monitoring, and application configuration.
  • Part IV – Scenarios – gives you an overview of key scenarios with the use of Tomcat, with a focus on Intranet/Internet scenarios and clustering.

Scope of This Guide

This guide is focused on delivering a production ready strategy for large scale deployment of Tomcat in a sensible and extensible fashion into production environments. This approach can also be used on a local developer station (either Windows or Linux) to create a baseline.

Out of Scope

  • Java Language
  • Application Specific Scenarios

Why We Wrote This Guide

From our own experience with Tomcat and through conversations with customers and co-workers who work in the field, we determined there was demand for a guide that would show how to use Tomcat in the real world. While there is information in the product documentation, in blog posts and in forums, there has been no single place to find proven practices for the effective use of Tomcat in the context of line of business applications under real world constraints.

Who Should Read This Guide

This guide is targeted at providing individuals involved in building applications with Tomcat. The following are examples of roles that would benefit from this guidance:

  • A development team that wants to adopt Tomcat.
  • A software architect or developer looking to get the most out of Tomcat, with regard to designing their infrastructure, enhancing performance, and deployment scenarios.
  • Interested parties investigating the use of Tomcat but don’t know how well it would work for their deployment scenarios and constraints.
  • Individuals tasked with learning Tomcat.
  • A enterprise architect looking to promote the use of Tomcat in their line-of-business applications
  • Individuals tasked with automating the deployment of applications into Tomcat.

How To Use This Guide

Use the first part of the guide to gain a firm foundation in key concepts around Tomcat. Next, use the application scenarios to evaluate potential designs for your scenario. The application scenarios are skeletal end-to-end examples of how you might design your authentication, authorization and communication from a production Tomcat perspective. Use the appendix of “Guidelines”, “Practices”, “How To” articles and “Questions and Answers” to dive into implementation details. This separation allows you to understand the topics first and then explore the details as you see fit.

Part I – Tomcat Overview


  • Learn the basic requirements for Tomcat
  • Learn what is provided by Tomcat out-of-the-box
  • Learn how to start and stop a basic Tomcat instance


This chapter provides a set of foundational building blocks on which to base your understanding of Tomcat. Additionally, this chapter introduces various terms and concepts used throughout this guide.

Tomcat Minimum Requirements

Tomcat Layout

The keys to building a scalable tomcat solution includes the understanding of two key environment properties used by tomcat and what they mean.

  • CATALINA_HOME – represents the static files used by tomcat server such as the lib and bin directories.
  • CATALINA_BASE by default is set to CATALINA_HOME. CATALINA_BASE allows for the logical separation of tomcat and line-of-business applications.


This directory is never used during runtime. It’s sole purpose is to store scripts and to startup and shutdown tomcat along with resolve classpaths.

Contains the jars that make up Tomcat Server and any dependencies Tomcat server has.

By default a folder that does not exist when Tomcat is first downloaded. It is a reserved folder that when created allows for global level JARs to be placed and loaded by all Tomcat instances. It is best practice for JDBC drivers and any other classes that may utilize the J2SE 1.4 endorsed feature. In short, there are some package names from which classes are only loaded if the package files are located in the endorsed directory.


This directory contains configuration information for the specified application.

This is where you place your applications that run within TC. (defined by server.xml)

This is a server level working directory for your web applications. (defined by server.xml)

Runtime logs generated when you run tomcat. (defined by logging.properties)

This directory is used by the JVM for temporary files. (defined by java.io.tmpdir)


  • Intranet – Line of Business Web Application Server
  • Intranet – Line of Business Web Services Server
  • Internet – Application Web Application Server
  • Internet – Application Web Services Server

Intranet – Line of Business Web Application Server


You want to provide a means to host web based applications within your company. These applications are all require a Java servlet container and you wish to standardize the deployment, configuration, and hosting into a single unified platform of development. This includes the standardization of tomcat across the environments and the ability to support multiple lines of development with minimal impact to each business area.

Key Characteristics

This scenario applies to you if:

  • Operate within a corporate IT department
  • Need a standard Java servlet hosting environment
  • Are restricted by resources and/or skill-sets and need the simplest possible option
  • Need a means to have a stepping stone to more robust vendor solutions
  • Have many paths of development and many applications in different stages of development
  • Require occasional customizations of servlet containers and can not afford to offload work to individual teams
  • You need the ability to run multiple JVMs – largely because some teams and/or some applications may need to run on legacy JVMs and you do not want to restrict other teams or applications from using a newer JVM
  • You need the ability to run multiple versions of tomcat quickly and easily with a standard interface
  • You need the ability to run multiple versions of tomcat to upgrade without having to re-implement core configuration options.
  • You need the ability to upgrade to a vendor provided solution like springsource should the resources become available without additional effort.


Leverage Tomcat’s built in ability to separate the core Tomcat server from web application specifics through the use of the out-of-the-box provided CATALINA_HOME and CATALINA_BASE configuration options.

Intended Benefits

By implementing this solution the following benefits can be reached by the organization;

  • You may find that the separation of applications makes outages less painful and more granular increasing operational efficiency
  • A clear understanding of the environment makes communication across departments easier
  • Through standardization comes the possibility to create automated solutions for provisioning new application servers, increasing efficiency and reducing impacts to workers
  • By separating each application into it’s own container allows for finer grained performance tuning and customization of a particular application
  • Operations will now be able to effectively troubleshoot performance problems down to a particular application, memory leaks and performance and health metrics can be tailored and targeted to diagnose and resolve issues without impacting other parties.
  • Development efforts can happen in parallel with minimal impacts to other teams, shutting down and restarting a development or sqa server for a application only effects that application.

Possible Side Effects

By creating a script solution you may encounter the following smells;

  • Individuals or teams may start referring to their applications using specific port numbers rather than canonical names.
  • By allowing unlimited expansion the number of web applications may grow and the need for more hardware may be required
  • By making it easier to deploy applications you may forget application justification processes, always justify the creation of any new application or service, and always due buy vs. build analysis

Other alternatives

Buy vs. build analysis on cloud based solutions and/or vendor provided solutions like springsource tcserver.

Solution Implementation Details

To provide this solution to your organization download from the Apache the appropriate versions of Tomcat you wish to provide along with the JVM of your choice. We are going to assume the latest installation as of this writing which is Tomcat 7.0.14.

  1. Download Tomcat 7.0.14
  2. Create a new folder /opt/dev/tomcat
  3. Unzip apache-tomcat-7.0.14 into /opt/dev/tomcat
  4. Create a new file run.sh inside /opt/dev/tomcat
  5. Create a new file server.sh inside /opt/dev/tomcat
  6. Create a new directory /opt/dev/tomcat/shared – this will hold server wide configuration information
  7. Copy server.xml in apache-tomcat-7.0.14/conf to ./shared
  8. Copy logging.properties in apache-tomcat-7.0.14/conf/logging.properties to ./shared

Provisioning a new Tomcat Instance

The convention used for creating a new tomcat instance is the creation of a new folder that contains the CATALINA_BASE core folder structure. This folder is given the same name as that of the port you wish Tomcat to use.

This allows for dynamically provisioning of a new Tomcat instances by simply creating the default CATALINA_BASE directory structure within a folder named the port number you wish to use.

By using this convention a system administrator can quickly understand what ports are being claimed by Tomcat and ensure there are no overlaps in port numbers. This also provides the capability that any startup/shutdown scripts will also pick up the new instance without the need to modify any /etc/init.d/ scripts so that any restart of the server will also pick up these changes.

Example – Provision A New Instance on Port 7051

  1. Create a new directory called /opt/dev/tomcat/7051
  2. Create the following directories [./bin, ./conf, ./logs, ./temp, ./work, ./webapp] inside 7051; mkdir -p /opt/dev/tomcat/7051/{bin,conf,logs,work,webapp,temp}
  3. Copy the default web.xml from your tomcat install directory cp /opt/dev/tomcat/tomcat-7.0.14/conf/web.xml /opt/dev/tomcat/7051/conf
  4. Optionally – create a new file setenv.sh inside ./7051/bin which will hold application specific customizations such a garbage collection options (see: example setenv.sh)

Now the configuration and provisioning is complete you should be able to start tomcat:
/opt/dev/tomcat/run.sh 7051 start

What remains is to deploy your application(s) to this tomcat instance by copying the wars to:



Automatic Startup/Shutdown of Multi Instance Tomcat

Enterprise applications need to have the ability to survive a restart of the server. By combining what you have now learned about tomcat and the new ability to run multiple instances you can now turn to the startup and shutdown of Tomcat using /etc/init.d.

  1. Create a new file in /etc/init.d called tomcat
  2. Run sudo chmod 755 /etc/init.d/tomcat
  3. Run /sbin/insserv -d tomcat

Fix Performance for SecureRandom creation of Sessions Taking Too Long

03-Jul-2011 20:33:19.272 INFO org.apache.catalina.util.SessionIdGenerator.createSecureRandom Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [68,810] milliseconds.

Problem is that it took 69 seconds for tomcat to startup because of the Session ID generation. This really isn’t acceptable and ignored by Sun as “not being a bug”.