Developer Documentation

Guide

For developers

Architecture Diagram:

tdar infrastructure.pptx

Set up the development environment

Download and install Maven 2 from http://maven.apache.org
check out the codebase from our SVN repository (svn+ssh://dev.tdar.org/svn/tdar.struts/trunk)
set up the database by performing the following:
1. make sure you have a tdar user with CREATEDB permissions on postgres
2. run the maven setup task:
```
mvn -Psetup-new-instance compile
```
3. set up the tdarmetadata and tdardata databases (this step should be managed automatically if possible)
use mvn eclipse:eclipse to generate Eclipse's .project & .classpath files, then add M2_REPO to the variables via Build Path -> Add Variable and set it to your local Maven 2 repository. On Unix-ish machines, this should be ~/.m2/repository. If you're using the m2eclipse plugin , this step usually isn't necessary.
(on GRID and before) to make google maps work on your development box, copy src/main/webapp/includes/googlemaps-api-key.js.template to src/main/webapp/includes/googlemaps-api-key.js and then customize it by changing the gmapsKey variable to the google maps api key for your host. You can get a google maps api key at http://code.google.com/apis/maps/signup.html
#Important maven targets include:
1. mvn compile
2. mvn test
3. mvn jetty:run to deploy the webapp on the port specified in pom.xml
4. mvn verify to run the unit and integration tests

Maven dependencies

To add new library dependencies, you can look them up via a repository search engine like http://mvnrepository.com or http://search.maven.org (see http://maven.apache.org/general.html#How_to_find_dependencies for more repository search engines). If so, add the pom snippet to the pom.xml, otherwise you may need to install it manually in our local maven staging repository. You can also add maven repositories to Archiva as a proxy connector, which will keep our pom.xml simpler.

Code Formatting:

Go into eclipse preferences
Navigate to Java > Code Style > Formatter
Click import and import the eclipse format file

Resources and Links

Infrastructure Documentation

Coding Standards

Design Document

Technology stack

Hibernate

We currently use Hibernate 3 for object relational mapping. Relationships are specified via JPA annotations though the actual DAOs used to store, retrieve, and update persistent entities use a hibernate SessionFactory behind the scenes instead of an EntityManager, which would be the most framework-agnostic way of object relational mapping.

More information about the tDAR hibernate implementation can be found at the tDAR Hibernate Documentation Page

Postgres

PostGIS

Spring

We are currently using Spring to make Hibernate easier to use, for dependency injection/IoC purposes (managing our beans/services/daos/data sources/hibernate sessions) and for transaction management.

Struts 2

We are currently using Struts 2.2.1 as the web application framework. More information about the Struts implementation can be found at the tDAR Struts Documentation Page.

A quick primer on the web technologies used by struts: http://struts.apache.org/primer.html

jQuery

http://eng.wealthfront.com/2010/10/jquery-right-way.html

Protege

Protege web tool: http://bmir-protege-dev1.stanford.edu/webprotege/

Other Web Frameworks

Mercurial

Our branching strategy generally works on 3 separate branches at most:

a.      DEFAULT: the current unstable development branch

b.     PRODUCTION: the branch that represents the production version of the code (for example, HARRIS)

c.      RELEASE: the branch that’s being prepared for production. Release is created about 1 month before the release of the branch and is pruned (if necessary) of unstable code that will not go into production – for example the Video code that’s been pruned to never show up.

Once a release is made, the production and DEFAULT branches are kept in parallel (with patches moving back and forth regularly for bug fixes) and the old production branch is retired.

URL Rewriting: http://tuckey.org/urlrewrite/

Performance tuning

Massively parallel data stores

Hadoop+HBase vs RDBMS
Hadoop applicability
We should look into Hadoop for ideas about scaling data integration and/or representation of datasets

Database Connection Pooling

We're currently using c3p0 connection pooling. From their documentation:

Configuring Connection Testing
c3p0 can be configured to test the Connections that it pools in a variety of ways, to minimize the likelihood that your application will see broken or "stale" Connections. Pooled Connections can go bad for a variety of reasons – some JDBC drivers intentionally "time-out" long-lasting database Connections; back-end databases or networks sometimes go down "stranding" pooled Connections; and Connections can simply become corrupted over time and use due to resource leaks, driver bugs, or other causes.
c3p0 provides users a great deal of flexibility in testing Connections, via the following configuration parameters:
automaticTestTable
connectionTesterClassName
idleConnectionTestPeriod
preferredTestQuery
testConnectionOnCheckin
testConnectionOnCheckout
idleConnectionTestPeriod, testConnectionOnCheckout, and testConnectionOnCheckin control when Connections will be tested. automaticTestTable, connectionTesterClassName, and preferredTestQuery control how they will be tested.
When configuring Connection testing, first try to minimize the cost of each test. By default, Connections are tested by calling the getTables() method on a Connection's associated DatabaseMetaData object. This has the advantage of working with any database, and regardless of the database schema. However, empirically a DatabaseMetaData.getTables() call is often much slower than a simple database query.
The most convenient way to speed up Connection testing is to define the parameter automaticTestTable. Using the name you provide, c3p0 will create an empty table, and make a simple query against it to test the database. Alternatively, if your database schema is fixed prior to your application's use of the database, you can simply define a test query with the preferredTestQuery parameter. Be careful, however. Setting preferredTestQuery will lead to errors as Connection tests fail if the query target table does not exist in your database table prior to initialization of your DataSource.
Advanced users may define any kind of Connection testing they wish, by implementing a ConnectionTester and supplying the fully qualified name of the class as connectionTesterClassName. If you'd like your custom ConnectionTesters to honor and support the preferredTestQuery and automaticTestTable parameters, implement UnifiedConnectionTester, most conveniently by extending AbstractConnectionTester. See the api docs for more information.
The most reliable time to test Connections is on check-out. But this is also the most costly choice from a client-performance perspective. Most applications should work quite reliably using a combination of idleConnectionTestPeriod and testConnectionsOnCheckIn. Both the idle test and the check-in test are performed asynchronously, which leads to better performance, both perceived and actual.
Note that for many applications, high performance is more important than the risk of an occasional database exception. In its default configuration, c3p0 does no Connection testing at all. Setting a fairly long idleConnectionTestPeriod, and not testing on checkout and check-in at all is an excellent, high-performance approach.