Developer Documentation
Guide
For developers
- tDAR development configuration
- tDAR database scheme
- Source code structure
- Unit and Integration Testing
- Data integration use case
- Deploying tDAR on a server
- Eclipse Setup / Configuration
Architecture Diagram:
tdar infrastructure.pptx
Set up the development environment
- Download and install Maven 2 from http://maven.apache.org
- check out the codebase from our SVN repository (svn+ssh://dev.tdar.org/svn/tdar.struts/trunk)
- set up the database by performing the following:
- make sure you have a tdar user with CREATEDB permissions on postgres
run the maven setup task:
mvn -Psetup-new-instance compile
- set up the tdarmetadata and tdardata databases (this step should be managed automatically if possible)
- use
mvn eclipse:eclipse
to generate Eclipse's.project
&.classpath
files, then addM2_REPO
to the variables viaBuild Path -> Add Variable
and set it to your local Maven 2 repository. On Unix-ish machines, this should be~/.m2/repository
. If you're using the m2eclipse plugin , this step usually isn't necessary. - (on GRID and before) to make google maps work on your development box, copy
src/main/webapp/includes/googlemaps-api-key.js.template
tosrc/main/webapp/includes/googlemaps-api-key.js
and then customize it by changing the gmapsKey variable to the google maps api key for your host. You can get a google maps api key at http://code.google.com/apis/maps/signup.html
#Important maven targets include:mvn compile
mvn test
mvn jetty:run
to deploy the webapp on the port specified inpom.xml
mvn verify
to run the unit and integration tests
Maven dependencies
To add new library dependencies, you can look them up via a repository search engine like http://mvnrepository.com or http://search.maven.org (see http://maven.apache.org/general.html#How_to_find_dependencies for more repository search engines). If so, add the pom snippet to the pom.xml, otherwise you may need to install it manually in our local maven staging repository. You can also add maven repositories to Archiva as a proxy connector, which will keep our pom.xml simpler.
Code Formatting:
- Go into eclipse preferences
- Navigate to Java > Code Style > Formatter
- Click import and import the eclipse format file
Resources and Links
Infrastructure Documentation
Coding Standards
Design Document
Technology stack
Hibernate
We currently use Hibernate 3 for object relational mapping. Relationships are specified via JPA annotations though the actual DAOs used to store, retrieve, and update persistent entities use a hibernate SessionFactory behind the scenes instead of an EntityManager, which would be the most framework-agnostic way of object relational mapping.
More information about the tDAR hibernate implementation can be found at the tDAR Hibernate Documentation Page
Postgres
PostGIS
Spring
We are currently using Spring to make Hibernate easier to use, for dependency injection/IoC purposes (managing our beans/services/daos/data sources/hibernate sessions) and for transaction management.
Struts 2
We are currently using Struts 2.2.1 as the web application framework. More information about the Struts implementation can be found at the tDAR Struts Documentation Page.
- A quick primer on the web technologies used by struts: http://struts.apache.org/primer.html
jQuery
Protege
- Protege web tool: http://bmir-protege-dev1.stanford.edu/webprotege/
Other Web Frameworks
Mercurial
Our branching strategy generally works on 3 separate branches at most:
a. DEFAULT: the current unstable development branch
b. PRODUCTION: the branch that represents the production version of the code (for example, HARRIS)
c. RELEASE: the branch that’s being prepared for production. Release is created about 1 month before the release of the branch and is pruned (if necessary) of unstable code that will not go into production – for example the Video code that’s been pruned to never show up.
Once a release is made, the production and DEFAULT branches are kept in parallel (with patches moving back and forth regularly for bug fixes) and the old production branch is retired.
URL Rewriting: http://tuckey.org/urlrewrite/
Performance tuning
- http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html
- http://struts.apache.org/2.x/docs/performance-tuning.html
- web performance tuning
- hibernate tuning
- infoq article on hibernate tuning
Massively parallel data stores
- Hadoop+HBase vs RDBMS
- Hadoop applicability
We should look into Hadoop for ideas about scaling data integration and/or representation of datasets
Database Connection Pooling
We're currently using c3p0 connection pooling. From their documentation:
Configuring Connection Testing
c3p0 can be configured to test the Connections that it pools in a variety of ways, to minimize the likelihood that your application will see broken or "stale" Connections. Pooled Connections can go bad for a variety of reasons – some JDBC drivers intentionally "time-out" long-lasting database Connections; back-end databases or networks sometimes go down "stranding" pooled Connections; and Connections can simply become corrupted over time and use due to resource leaks, driver bugs, or other causes.
c3p0 provides users a great deal of flexibility in testing Connections, via the following configuration parameters:
automaticTestTable
connectionTesterClassName
idleConnectionTestPeriod
preferredTestQuery
testConnectionOnCheckin
testConnectionOnCheckout
idleConnectionTestPeriod, testConnectionOnCheckout, and testConnectionOnCheckin control when Connections will be tested. automaticTestTable, connectionTesterClassName, and preferredTestQuery control how they will be tested.When configuring Connection testing, first try to minimize the cost of each test. By default, Connections are tested by calling the getTables() method on a Connection's associated DatabaseMetaData object. This has the advantage of working with any database, and regardless of the database schema. However, empirically a DatabaseMetaData.getTables() call is often much slower than a simple database query.
The most convenient way to speed up Connection testing is to define the parameter automaticTestTable. Using the name you provide, c3p0 will create an empty table, and make a simple query against it to test the database. Alternatively, if your database schema is fixed prior to your application's use of the database, you can simply define a test query with the preferredTestQuery parameter. Be careful, however. Setting preferredTestQuery will lead to errors as Connection tests fail if the query target table does not exist in your database table prior to initialization of your DataSource.
Advanced users may define any kind of Connection testing they wish, by implementing a ConnectionTester and supplying the fully qualified name of the class as connectionTesterClassName. If you'd like your custom ConnectionTesters to honor and support the preferredTestQuery and automaticTestTable parameters, implement UnifiedConnectionTester, most conveniently by extending AbstractConnectionTester. See the api docs for more information.
The most reliable time to test Connections is on check-out. But this is also the most costly choice from a client-performance perspective. Most applications should work quite reliably using a combination of idleConnectionTestPeriod and testConnectionsOnCheckIn. Both the idle test and the check-in test are performed asynchronously, which leads to better performance, both perceived and actual.
Note that for many applications, high performance is more important than the risk of an occasional database exception. In its default configuration, c3p0 does no Connection testing at all. Setting a fairly long idleConnectionTestPeriod, and not testing on checkout and check-in at all is an excellent, high-performance approach.
Other documents and miscellany
- Interesting ppt presentation on integrating ontologies with application development (out of date though)
- Older Gridsphere docs: GridSphere and Gridsphere Infrastructure
- Many Eyes
- Swivel
- interesting dataset visualization possibilities - http://googlegeodevelopers.blogspot.com/2009/01/timemap-helping-you-add-4th-dimension.html