Table of Contents |
---|
Excerpt |
---|
To support horizontal scalability use cases as well as geographic distribution, Fedora 4 can be configured as a cluster of application servers. |
Note |
---|
This feature is still undergoing development. |
Warning | ||
---|---|---|
| ||
The are still some issues present in Fedora 4 which may lead to partial ingests, due to synchronization timeouts. This can be partialy mitigated by increasing the replTimeout property in infinispan.xml. |
Configuration
Fedora 4 is built in top of the JCR implementation Modeshape. Modeshape uses Infinispan as a distributed datastore, which in turn uses the Messaging Toolkit JGroups to transfer state between nodes.
Therefore the following resources and documentatios documents contain a lot of important information about configuring Fedora 4's underlying projects.
- Fedora configuration inventory
- Modeshape documentation
- Modeshape configuration section
- Infinispan documentation
- Infinispan configuration section
- JGroups documentation
- JGroups configuration section
Step-By-Step guides for deploying Fedora 4 clusters
Deploy cluster using the UDP Multicast protocol for node discovery and the TCP protocol for replication
A couple of few configuration options have to be set in order to have Fedora 4 work as a cluster on a local machine:
- -Xmx1024m Set the Java heap to 1GB
- -XX:MaxPermSize=256m Set the Java PermGen size to 256MB
- fcrepo-Dfcrepo.modeshape.configuration=file:///path/to/repository.json The Modeshape configuration used for clustering
- java-Djava.net.preferIPv4Stack=true Tell Java to use IPv4 rather than IPv6
- fcrepo.ispn.jgroups.configuration=/path/to/jgroups-fcrepo-tcp.xml Set the JGroups configuration file holding the TCP Transport defintions
- jgroups.udp.mcast_addr=239.42.42.42 Set the UDP multicast address for the JGroups cluster
- fcrepo.infinispan.cache_configuration=/path/to/infinispan.xml Set the Infinispan configuration file holding the Infinispan cluster configuration
JGroups configuration
In order to use the UDP Multicasting for node discovery and TCP for replication the following JGroups example configuration can be used:
...
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
<config xmlns="urn:org:jgroups" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/JGroups-3.0.xsd"> <TCP bind_port="7800" loopback="false" recv_buf_size="${tcp.recv_buf_size:5M}" send_buf_size="${tcp.send_buf_size:640K}" max_bundle_size="64K" max_bundle_timeout="30" use_send_queues="true" sock_conn_timeout="300" timer_type="new3" timer.min_threads="4" timer.max_threads="10" timer.keep_alive_time="3000" timer.queue_max_size="500" thread_pool.enabled="true" thread_pool.min_threads="1" thread_pool.max_threads="10" thread_pool.keep_alive_time="5000" thread_pool.queue_enabled="true" thread_pool.queue_max_size="10000" thread_pool.rejection_policy="discard" oob_thread_pool.enabled="true" oob_thread_pool.min_threads="1" oob_thread_pool.max_threads="8" oob_thread_pool.keep_alive_time="5000" oob_thread_pool.queue_enabled="false" oob_thread_pool.queue_max_size="100" oob_thread_pool.rejection_policy="discard"/> <MPING timeout="1000" num_initial_members="1"/> <MERGE2 max_interval="30000" min_interval="10000"/> <FD_ALL timeout="150000"/> <VERIFY_SUSPECT timeout="150000" /> <BARRIER /> <pbcast.NAKACK2 use_mcast_xmit="false" discard_delivered_msgs="true"/> <UNICAST timeout="600,900,2500"/> <pbcast.STABLE stability_delay="2000" desired_avg_gossip="50000" max_bytes="4M"/> <pbcast.GMS print_local_addr="true" join_timeout="6000" view_bundling="true"/> <MFC max_credits="2M" min_threshold="0.4"/> <FRAG2 frag_size="60K" /> <pbcast.STATE_TRANSFER /> </config> |
Infinispan configuration
The following example configuration has it's replication timeout set to 10 minutes in order to mitigate the problem of SyncTimeouts when spanning one transaction over a lot of operations
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
<infinispan xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:infinispan:config:5.2 http://www.infinispan.org/schemas/infinispan-config-5.2.xsd" xmlns="urn:infinispan:config:5.2"> <global> <globalJmxStatistics enabled="true" allowDuplicateDomains="true"/> <transport clusterName="modeshape-cluster"> <properties> <property name="configurationFile" value="${fcrepo.ispn.jgroups.configuration:config/jgroups-fcrepo-tcp.xml}"/> </properties> </transport> </global> <default> <clustering mode="distribution"> <sync replTimeout="600000"/> <l1 enabled="false" lifespan="0" onRehash="false"/> <hash numOwners="${fcrepo.ispn.numOwners:2}"/> <stateTransfer chunkSize="100" fetchInMemoryState="true"/> </clustering> </default> <namedCache name="FedoraRepository"> <clustering mode="replication"> <sync replTimeout="6000000"/> <l1 enabled="false" lifespan="0" onRehash="false"/> <stateTransfer chunkSize="100" fetchInMemoryState="true" timeout="120000"/> </clustering> <locking isolationLevel="READ_COMMITTED" writeSkewCheck="false" lockAcquisitionTimeout="150000" useLockStriping="true" /> <transaction transactionMode="TRANSACTIONAL" lockingMode="PESSIMISTIC"/> <loaders passivation="false" shared="false" preload="false"> <loader class="org.infinispan.loaders.file.FileCacheStore" fetchPersistentState="true" purgeOnStartup="false"> <properties> <property name="location" value="${fcrepo.ispn.repo.CacheDirPath:target/FedoraRepository/storage}"/> <property name="fsyncMode" value="perWrite"/> </properties> </loader> </loaders> </namedCache> <namedCache name="FedoraRepositoryMetaData"> <clustering mode="distribution"> <sync replTimeout="600000"/> <l1 enabled="false" lifespan="0" onRehash="false"/> <hash numOwners="${fcrepo.ispn.numOwners:2}"/> <stateTransfer chunkSize="100" fetchInMemoryState="true"/> </clustering> <locking concurrencyLevel="1000" lockAcquisitionTimeout="150000" useLockStriping="false" /> <deadlockDetection enabled="true" spinDuration="1000"/> <eviction maxEntries="500" strategy="LIRS" threadPolicy="DEFAULT"/> <transaction transactionManagerLookupClass="org.infinispan.transaction.lookup.GenericTransactionManagerLookup" transactionMode="TRANSACTIONAL" lockingMode="PESSIMISTIC"/> <loaders passivation="false" shared="false" preload="false"> <loader class="org.infinispan.loaders.file.FileCacheStore" fetchPersistentState="true" purgeOnStartup="false"> <properties> <property name="location" value="${fcrepo.ispn.CacheDirPath:target/FedoraRepositoryMetaData/storage}"/> <property name="fsyncMode" value="perWrite"/> </properties> </loader> </loaders> </namedCache> <namedCache name="FedoraRepositoryBinaryData"> <clustering mode="distribution"> <sync replTimeout="600000"/> <l1 enabled="false" lifespan="0" onRehash="false"/> <hash numOwners="${fcrepo.ispn.numOwners:2}"/> <stateTransfer chunkSize="100" fetchInMemoryState="true"/> </clustering> <locking concurrencyLevel="1000" lockAcquisitionTimeout="150000" useLockStriping="false" /> <deadlockDetection enabled="true" spinDuration="1000"/> <eviction maxEntries="100" strategy="LIRS" threadPolicy="DEFAULT"/> <transaction transactionManagerLookupClass="org.infinispan.transaction.lookup.GenericTransactionManagerLookup" transactionMode="TRANSACTIONAL" lockingMode="PESSIMISTIC"/> <loaders passivation="false" shared="false" preload="false"> <loader class="org.infinispan.loaders.file.FileCacheStore" fetchPersistentState="true" purgeOnStartup="false"> <properties> <property name="location" value="${fcrepo.ispn.binary.CacheDirPath:target/FedoraRepositoryBinaryData/storage}"/> <property name="fsyncMode" value="perWrite"/> </properties> </loader> </loaders> </namedCache> </infinispan> |
Modeshape configuration
The following configuration has indexing disabled completely in order to increase ingest performance
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
{ "name" : "repo", "jndiName" : "", "workspaces" : { "predefined" : ["default"], "default" : "default", "allowCreation" : true }, "clustering" : { "clusterName" : "modeshape-cluster" }, "query" : { "enabled" : "false", }, "storage" : { "cacheName" : "FedoraRepository", "cacheConfiguration" : "${fcrepo.infinispan.cache_configuration:config/infinispan/clustered/infinispan.xml}", "binaryStorage" : { "type" : "cache", "dataCacheName" : "FedoraRepositoryBinaryData", "metadataCacheName" : "FedoraRepositoryMetaData" } }, "security" : { "anonymous" : { "roles" : ["readonly","readwrite","admin"], "useOnFailedLogin" : false }, "providers" : [ { "classname" : "org.fcrepo.http.commons.session.BypassSecurityServletAuthenticationProvider" } ] }, "node-types" : ["fedora-node-types.cnd"] } |
Using Tomcat7
- Build the fcrepo war file or download the prebuilt fcrepo war file
- Build the War file as described on this page
- Fetch the WAR file from the download page
- Get Tomcat
Download Tomcat 7 and unpack it (Tomcat 7.0.50 and unpack itis used in this example)
Code Block #> wget http://mirror.synyx.de/apache/tomcat/tomcat-7/v7.0.50/bin/apache-tomcat-7.0.50.tar.gz #> tar -zxvf apache-tomcat-7.0.50.tar.gz #> mv apache-tomcat-7.0.50 tomcat7
- Put the WAR file into tomcat's webapp directory or create a symbolic link
Copy the fcrepo-webapp-VERSION.war file
Code Block #> cp fcrepo-webapp-VERSION.war tomcat7/webapps/fcrepo.war
- Set the send/recv buffer sizes if neccessary
Use the following commands to set the buffer size. To persist these settings between reboots, also put them in /etc/sysctl.conf.
Code Block #> sysctl net.core.rmem_max=26214400 #> sysctl net.core.wmem_max=5242880
- Start instances
Using a custom configuration by pointing Fedora 4 to custom configuration files:
Code Block #> CATALINA_OPTS="-Xmx1024m -XX:MaxPermSize=256m -Djava.net.preferIPv4Stack=true -Djgroups.udp.mcast_addr=239.42.42.42 -Dfcrepo.modeshape.configuration=file:///path/to/repository.json -Dfcrepo.ispn.jgroups.configuration=/path/to/jgroups-fedorafcrepo-udptcp.xml -Dfcrepo.infinispan.cache_configuration=/path/to/infinispan.xml" bin/catalina.sh run
Deploy cluster using the UDP Multicast protocol for node discovery and replication
Warning | ||
---|---|---|
| ||
Currently there are still issues using UDP Mulitcasting for replication, while using UDP for node discovery works as intended. |
...
- Xmx1024m Set the Java heap to 1GB
- XX:MaxPermSize=256m Set the Java PermGen size to 256MB
- fcrepo.modeshape.configuration=file:///path/to/repository.json The Modeshape configuration used for clustering
- java.net.preferIPv4Stack=true Tell Java to use IPv4 rather than IPv6
- fcrepo.ispn.jgroups.configuration=/path/to/jgroups-fcrepo-udp.xml Set the JGroups configuration file holding the UDP Transport defintions
- jgroups.udp.mcast_addr=239.42.42.42 Set the UDP multicast address for the JGroups cluster
- fcrepo.infinispan.cache_configuration=/path/to/infinispan.xml Set the Infinispan configuration file holding the Infinispan cluster configuration
Using Tomcat7
- Build the fcrepo war file or download the prebuilt fcrepo war file
- Build the War file as described on this page
- Fetch the WAR file from the download page
- Get Tomcat
Download Tomcat 7 and unpack it (Tomcat 7.0.50 and unpack itis used in this example)
Code Block #> wget http://mirror.synyx.de/apache/tomcat/tomcat-7/v7.0.50/bin/apache-tomcat-7.0.50.tar.gz #> tar -zxvf apache-tomcat-7.0.50.tar.gz #> mv apache-tomcat-7.0.50 tomcat7
- Put the WAR file into tomcat's webapp directory or create a symbolic link
Copy the fcrepo-webapp-VERSION.war file
Code Block #> cp fcrepo-webapp-VERSION.war tomcat7/webapps/fcrepo.war
- Setup the cluster configuration (Optional)
- This github project contains a sample configuration for a cluster in distributed mode.
- Change the configuration as required. Description is available at the JGroups, Infinispan and Modeshape documentations
- Make sure to point Fedora 4 to the configuration files by updating the file
$TOMCAT_HOME/bin/setenv.sh
(create if necessary) using the propertiesfcrepo.modeshape.configuration,
fcrepo.ispn.jgroups.configuration
andfcrepo.infinispan.cache_configuration
- Set the send/recv buffer sizes if neccessary
Use the following commands to set the buffer size
Code Block #> sysctl net.core.rmem_max=5242880 #> sysctl net.core.wmem_max=5242880
- Start instance
Using the default clustered configuration (Replication mode):
Code Block #> CATALINA_OPTS="-Xmx1024m -XX:MaxPermSize=256m -Dfcrepo.modeshape.configuration=config/clustered/repository.json -Djava.net.preferIPv4Stack=true -Djgroups.udp.mcast_addr=239.42.42.42" bin/catalina.sh run
Using a custom configuration by pointing Fedora 4 to custom configuration files:
Code Block #> CATALINA_OPTS="-Xmx1024m -XX:MaxPermSize=256m -Djava.net.preferIPv4Stack=true -Djgroups.udp.mcast_addr=239.42.42.42 -Dfcrepo.modeshape.configuration=file:///path/to/repository.json -Dfcrepo.ispn.jgroups.configuration=/path/to/jgroups-fedora-udp.xml -Dfcrepo.infinispan.cache_configuration=/path/to/infinispan.xml" bin/catalina.sh run
Deploy cluster using the UDP Multicast protocol for node discovery and replication on a single machine
Warning | ||
---|---|---|
| ||
Currently there are still issues using UDP Mulitcasting for replication, while using UDP for node discovery works as intended. |
...
- Xmx1024m Set the Java heap to 1GB
- XX:MaxPermSize=256m Set the Java PermGen size to 256MB
- fcrepo.modeshape.configuration=file:///home/ruckus/dev/tomcat7-8081/repository.json The Modeshape configuration used for clustering
- java.net.preferIPv4Stack=true Tell Java to use IPv4 rather than IPv6
- fcrepo.ispn.jgroups.configuration=/home/ruckus/dev/tomcat7-8081/jgroups-fcrepo-udp.xml Set the JGroups configuration file holding the UDP Transport defintions
- jgroups.udp.mcast_addr=239.42.42.42 Set the UDP multicast address for the JGroups cluster
- fcrepo.infinispan.cache_configuration=/home/ruckus/dev/tomcat7-8081/infinispan.xml Set the Infinispan configuration file holding the Infinispan cluster configuration
- jms.port=61617 The port used by ActiveMQ's JMS protocol. This needs to be distinct for every instance of fcrepo4
- stomp.port=61614 The port used by ActiveMQ Stomp protocol. This needs to be distinct for every instance of fcrepo4
Using Tomcat7
- Build the fcrepo war file or download the prebuilt fcrepo war file
- Build the War file as described on this page
- Fetch the WAR file from the download page
- Get Tomcat
Download Tomcat 7 and unpack it (Tomcat 7.0.50 and unpack itis used in this example)
Code Block #> wget http://mirror.synyx.de/apache/tomcat/tomcat-7/v7.0.50/bin/apache-tomcat-7.0.50.tar.gz #> tar -zxvf apache-tomcat-7.0.50.tar.gz #> mv apache-tomcat-7.0.50 tomcat7-8080
- Put the WAR file into tomcat's webapp directory or create a symbolic link
Copy the fcrepo-webapp-VERSION.war file
Code Block #> cp fcrepo-webapp-VERSION.war tomcat7-8080/webapps/fcrepo.war
- Get the Infinispan XML configuration file
Download from github and put it into tomcat7-8080
Code Block #> wget -O infinispan.xml https://gist.github.com/fasseg/8646707/raw #> mv infinispan.xml tomcat7-8080/
- Get the Modeshape JSON configuration file
Download from Github and put into tomcat7-8080
Code Block #> wget -O repository.json https://gist.github.com/fasseg/8646727/raw #> mv repository.json tomcat7-8080/
- Get the JGroups UDP Transport configuration file
Download from Github and put into tomcat7-8080
Code Block #> wget -O jgroups-fedora-udp.xml https://gist.github.com/fasseg/8646743/raw #> mv jgroups-fedora-udp.xml tomcat7-8080/
- Set the send/recv buffer sizes if neccessary
Use the following commands to set the buffer size
Code Block #> sysctl net.core.rmem_max=5242880 #> sysctl net.core.wmem_max=5242880
- Copy the whole tomcat folder in order to create a second instance
Code Block #> cp -R tomcat7-8080/ tomcat7-8081
- Change the connector ports of the second Tomcat instance to 8081, 8006 and 8010
Code Block #> sed -i 's/8080/8081/g' tomcat7-8081/conf/server.xml #> sed -i 's/8005/8006/g' tomcat7-8081/conf/server.xml #> sed -i 's/8009/8010/g' tomcat7-8081/conf/server.xml
- Start the first instance
Code Block #> CATALINA_OPTS="-Xmx1024m -XX:MaxPermSize=256m -Dfcrepo.modeshape.configuration=file:///path/to/repository.json -Djava.net.preferIPv4Stack=true -Dfcrepo.ispn.jgroups.configuration=/path/to/jgroups-fedora-udp.xml -Djgroups.udp.mcast_addr=239.42.42.42 -Dfcrepo.infinispan.cache_configuration=/path/to/infinispan.xml" tomcat7-8080/bin/catalina.sh run
- Start the second instance
Code Block #> CATALINA_OPTS="-Xmx1024m -XX:MaxPermSize=256m -Dfcrepo.modeshape.configuration=file:///path/to/repository.json -Djava.net.preferIPv4Stack=true -Dfcrepo.ispn.jgroups.configuration=/path/to/jgroups-fedora-udp.xml -Djgroups.udp.mcast_addr=239.42.42.42 -Dfcrepo.infinispan.cache_configuration=/path/to/infinispan.xml -Djms.port=61617 -Dstomp.port=61614" tomcat7-8081/bin/catalina.sh run
- Check that the instances are reachable and that the clustersize is '2':
Code Block #> wget http://localhost:8080/fcrepo/rest #> wget http://localhost:8081/fcrepo/rest
- Navigate to http://localhost:8080/fcrepo/rest
Load Balancing Fedora 4 using Apache and mod_jk
Load balancing can be achieved by using an Apache server with mod_jk in front of the Fedora 4 cluster. Using mod_jk one has to create as many workers in the workers.properties configuration file as there are Fedora 4 nodes.
See this example on the RedHat pages
...
Using TCP for Discovery and Sync
If you cannot use UDP, which is recommended for larger clusters, then TCP can be used. There are some things to watch out for when configuring TCP.
TCPPING Element
Each host has a different TCPPING configuration. This can be tricky to configure:
- The initial_hosts attribute should only include remote hosts for the local node, not itself.
- The num_initial_members should match the count of remote hosts, i.e. it does not include the local node either.
- If you are running just one port on each host, then port_range will be 0.
- IP resolution is important if you use DNS names. The locally resolved IP of each remote host must match the TCP@bind_addr in the remote host config.
TCP Element
If you use a hostname for bind_addr, make sure that it resolved to the IP you want, probably an external one and not the loopback IP. This is easy to miss.
Deploying in AWS
Java options for Tomcat or Jetty
Code Block |
---|
JAVA_OPTS="$JAVA_OPTS -Xmx1024m -XX:MaxPermSize=256m -Dfcrepo.modeshape.configuration=file:///config/clustered/repository.json"
JAVA_OPTS="$JAVA_OPTS -Dfcrepo.infinispan.cache_configuration=config/infinispan/clustered/infinispan.xml"
JAVA_OPTS="${JAVA_OPTS} -Dfcrepo.home=/tmp/wherever"
JAVA_OPTS="${JAVA_OPTS} -Djgroups.tcp.address=<private-ip-address-of-ec2-instance>"
JAVA_OPTS="${JAVA_OPTS} -Dfcrepo.ispn.numOwners=2 -Djava.net.PreferIPv4Stack=true"
# The jgroups-ec2.xml file is included in ispn's jars
JAVA_OPTS="${JAVA_OPTS} -Dfcrepo.ispn.jgroups.configuration=jgroups-ec2.xml"
# This property overwrites the S3 bucketname variable in jgroups-ec2.xml
JAVA_OPTS="${JAVA_OPTS} -Djgroups.s3.bucket=<some-bucket-that-you-have-already-created>" |
See this example on Fedora Cluster Installation in AWS
Load Balancing Fedora 4 using Apache and mod_jk
Load balancing can be achieved by using an Apache server with mod_jk in front of the Fedora 4 cluster. Using mod_jk one has to create as many workers in the workers.properties configuration file as there are Fedora 4 nodes.
See this example on the RedHat pages
Firewall Notes
If your cluster has a firewall between nodes, you'll need to open the following ports:
UDP multicast source address and port for JGroups messaging
Example: if your nodes are on the 192.168.47.x network, then this would be your iptables rule:
Code Block title UDP multicast firewall rule -A INPUT -m pkttype --pkt-type multicast -s 192.168.47.0/24 -j ACCEPT
TCP replication: TCP bind address and port (source)
Example: if your nodes are on the 192.168.47.x network, and the TCP bind_port is 7800, then this would be your iptables rule:Code Block title TCP replication firewall rule -A INPUT -m state --state NEW -m tcp -p tcp -s 192.168.47.0/24 --dport 7800 -j ACCEPT
UDP replication: UDP bind address and port (source)
Example: if your nodes are on the 192.168.47.x network, and the UDP bind_port is 47788, then this would be your iptables rule:Code Block title UDP replication firewall rule -A INPUT -m state --state NEW -m udp -p udp -s 192.168.47.0/24 --dport 47788 -j ACCEPT
Simple Shell script to coordinate a cluster
For pushing configurations and wars/jars, start, stop, restart and purge the Ubuntu 12.04 LTS cluster this small script gets used on the FIZ cluster.
...