ServerSense runs across data centres to ensure high availability and fault tolerance.
- Single point of failure so If the Master was to die, we could get one of the Slaves to promote itself to become the master.
- Although a bespoke Quoram voting system could be written there are existing distributed management systems that could be used such as Zookeeper.
- Initially thought Zookeeper wouldn’t work well over data centres because of the possible problem with latency when doing voting
- With Master/Slave e data of which commands, servers, contacts would have to be replicated to all the masters in the ensemble as we wouldn’t not be sure which server would be elected as the new master.
- Distribute data across the nodes. Distributing a conventional SQL database is possible but typically this type of clustering would be done within the same data centre.
- NoSQL limited by the CAP Theorem which proved that that a database cannot have consistency (ACID properties) , availability (service is always up, with no downtime) and partial tolerance (problem with the network). Here is a nice proof.
- A traditional relational database is Consistent and Available but not resistant to partial tolerance.
- MongoDB is Consistent and allows for Partial Tolerance but is not always available (i.e. some of the data might not be accessible).
- Cassandra is Available and Partial Tolerant. It will be eventually consistent but cannot guarantee consistency
For ServerSense this is a reasonable compromise because
- The application must continue monitoring at all costs.
- Cassandra also is optimized for writes; since the results of each monitoring command has to be recorded this is also highly desirable.
- Cassandra also has a couple of Java API that can be used and is supported by a large open source community and Datastax, a company that offers support.