Skip to end of metadata
Go to start of metadata

This page is meant to document steps required to use OpenDJ as a suitable repository for OpenIDM.

Goals

The primary goal of this is to support generic CREST repositories. OpenDJ's rest2ldap currently exposes a CREST CollectionResourceProvider that should be a suitable repository for IDM.

Initializing the Repo

Given that we are looking for a generic CollectionResourceProvider to act as a repository it is likely we will need some custom initialization code depending on the implementation of the interface. This would probably be a Groovy or Javascript script in the repo config.

Configuring the CollectionResourceProvider

Rest2LDAP provides a builder for easily configuring and creating an LDAPCollectionResourceProvider. We can use the Rest2LDAP.builder() method to instantiate the builder and then use the following methods to configure and build it:

  • ldapConnectionFactory(ConnectionFactory factory) 
    • Sets the connection factory
  • configureMapping(JsonValue configuration)
    • Configures the JSON to LDAP mapping
  • build()
    • Builds the resource provider

The ConnectionFactory can be created using the Rest2LDAP.configureConnectionFactory(JsonValue configuration) method. See the opendj.rest2ldap-servlet.json configuration file for an example of the connection factory config and two mapping configuration examples.

Embedding OpenDJ

OpenDJ can be embedded in openidm-repo-opendj module and started at startup similarly to how start the embedded OrientDB.  There is a good implementation to use as a reference in the AMSessionStore project in OpenAM.  In OpenIDM we will probably need to initialize and start the embedded opendj instance in the bundle Activator (see SetupOpenDJ.main() and EmbeddedOpenDJ.setup()), and register a RepoBootService implementation.

Concurrency Controls

For a generic CREST repo we would likely need to make a configurable revision property that would be injected to _rev.

LdapCollectionResourceProvider currently supports OCC via etags using the revision property of the request. The etag attribute is configured via the "etagAttribute" property of the Rest2LDAP config. The etag attribute likely needs to be requested as part of the "additionalLDAPAttributes" configuration as well.

Given the "eventual" consistency model of replicated DJ this will only work on a single node. We may need to force MVCC operations to a primary node.

Elastic Concurrency

If we wish to achieve elastic scalability forcing MVCC operations to a master node will not be ideal. We could potentially implement a soft-shard approach where we establish MVCC locality on a given node for a subset of data. A basic example below demonstrates locality based on the first character in a hash of the userName.

Example ID set with Hashes
UserHash
user.1b6269e6ed1cbd6689cab7cc6139a7b8d
user.25c7daa62a9816123819f1524d7edc132
user.31ce18f8c6727b7a7162ddcee62180717
user.42e058cc3c38c269c289817aa81e091f5
user.5354f82a821e56f755bdea1c3df239044
user.6950b3ee562daa927d7aed37516148084
user.7422be577514949b279ad83b06df1dc34
user.83da6fdf9465323cbef7ffc3841631162
user.9c14fac048822674470b2df7588164156
user.1056f091fee2df5c233f2722212f802d02

If we have 4 servers they would get hashes starting with 0-3, 4-7, 8-b, c-f respectively.

RangeServerUsers
0-3Server 1user.3, user.4, user.5, user.8
4-7Server 2

user.2, user.7, user.10

8-b

Server 3user.1, user.6
c-fServer 4user.9

Naturally, as the data set grew distribution would become more even.

 

An initial implementation would require that all cluster nodes simply be replicates in DJ which would result in all cluster nodes having a full dataset. The designated segment would only act as the most recent copy of the data.

 

Possibility of data loss: There is a possibility of data loss if a cluster node goes down with changes that have not yet been persisted to the rest of the cluster. It may be possible to designate multiple primary nodes per subset similar to ElasticSearch (http://www.elastic.co/guide/en/elasticsearch/guide/master/_scale_horizontally.html). This could also eventually lead to true sharding with data partitioning so all servers are not required to carry a full dataset (at the cost of fault tolerance).



Potential Issues

LDAPCollectionResourceProvider currently only supports querying via queryFilter. We can hard-code basic queries such as query-all-ids like we do in the scheduler but we will likely want to look in to having named queryFilters instead of queryExpressions to make these more portable.

 

Implementation Steps

Phase 1

For the initial phase of implementation we will be supplementing the existing OrientDB repository. We will create an additional repo.opendj.json file that will sit alongside the repo.orientdb.json. Calls for managed users will be intercepted on the router and sent to OpenDJRepoService.

  • Create new repository module  OPENIDM-3153 - Getting issue details... STATUS
  • Create OpenDJRepoService for handling persistence via Rest2LDAP  OPENIDM-3173 - Getting issue details... STATUS
  • Create configuration mapping for managed users  OPENIDM-3158 - Getting issue details... STATUS
  • Add support for embedded DJ server  OPENIDM-3161 - Getting issue details... STATUS

Phase 2

Phase two of the implementation will consist of adding additional persistence capabilities beyond managed users (config, schedulers, audit). We may be able to drop OrientDB during this phase if all persistence requirements are met. This would likely mean using H2 as the Activiti store.

  • Scheduler Persistence  OPENIDM-3171 - Getting issue details... STATUS
  • Cluster Config  OPENIDM-3172 - Getting issue details... STATUS
  • Audit Persistence 
  • Config Persistence  OPENIDM-3169 - Getting issue details... STATUS
  • Links Persistence  OPENIDM-3163 - Getting issue details... STATUS

Activiti

Activiti is currently very tightly coupled to a SQL relational database via JDBC. There has been some initial work by a core member to get Activiti working on top of Neo4J. In the interim Activiti could be supported via H2 as it is currently with OrientDB.

  • No labels