MongoDB — popular document-oriented database system which does not require developing and create DB schemes.
Widely used in web applications development and also for any kind of software where NoSQL is more effective and useful than relational DB.
The only DB server is often enough for development of an application or testing it.
It is recommended before to start to use an application in production environment to think about possible risks — what can happen if data is stored on the only DB server?
If the data is not critical (e.g., for cache purpose), an application will continue to work without this data, time needed for restoring DB (in case of any failure) is acceptable — then the only server is enough.
Otherwise, if the data is considered as critical, its loss or inaccessibility affects an application (hence, a service which its clients receive) — in this case the recommendation is to use a cluster for the data storing.
MongoDB allows building two types of a cluster:
- Replica set: a group of servers storing the same data, accessibility of the data ensures by replicating.
- Sharded cluster: spreading parts of the data among several servers, means horizontal scaling.
Below is shown configuration process of a replica set cluster.
Replica set — servers roles.
MongoDB cluster requires three servers (nodes) as a minimum (or odd number):
- Primary server — handles all read\writes requests. All operations modifying the data are stored in special collection named “oplog” (operation log) — entries from this collection are used to perform replication.
- Secondary server (minimum two servers) — replicates oplog collection and apply changes to the stored data. This process ensures the data is identical on all servers.
Common scheme:
Replica set — configuration.
Servers’ details used:
- Primary server: name — server1, IP 10.20.30.10
- Secondary server: name — server2, IP 10.20.30.20
- Secondary server: name — server3, IP 10.20.30.30
Replace names and IP addresses with yours.
Steps for all servers before to continue:
- Install MongoDB.
- Configure firewall if it’s in use to allow in\out packets on port 27017.
- Set servers names resolution: if no DNS server is in use, then it could be done by adding entries to the file /etc/hosts like this:
10.20.30.20 server2.domain.local server2
10.20.30.30 server3.domain.local server3
Important note: all configuration commands below have to contain servers names only! Starting from version 5.0 it is not allowed to use IP addresses — MongoDB checks configuration and will fail to start if there are IPs.
Steps for cluster configuration:
1. Start MongoDB on all servers:
“rs0” — replica name (choose another name if needed)
server_name — name of a server on which MongoDB is starting.
2. On primary server start MongoDB shell named “mongo” (by using your credential instead of username and password):
3. Initialize the replication and add all servers to the cluster:
_id: “rs0”,
members: [
{ _id: 0, host: “server1.domain.local:27017” },
{ _id: 1, host: “server2.domain.local:27017” },
{ _id: 2, host: “server3.domain.local:27017” },
]
})
This step has to be done on primary server only, no need to repeat it on the secondary.
4. Check the replication settings by the command “rs.conf()” — output of this command shows parameters of every server. For the primary server it should look like this:
"_id" : "rs0",
"version" : 1,
"protocolVersion" : NumberLong(1),
"members" : [
{
"_id" : 0,
"host" : "server1. domain.local:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
5. It could be noticed that every server has default priority — 1: the string “priority: 1”.
Priority value is used in the election: the lowest value means the highest possibility to become as a new primary server.
It could be defined prior what exactly secondary server will be elected as a new primary by these commands:
conf[‘members’][0].priority = 1
conf[‘members’][1].priority = 2
conf[‘members’][2].priority = 3
rs.reconfig(conf)
Thus,
- server1 is always primary.
- server2 will be new primary if server1 is inaccessible.
- server3 will stay the only and primary server in the cluster if both server1 and server2 are inaccessible.
6. Check the replication status by the command “rs.status()”. Presence of all servers should be checked in the output as well as its status (the string “stateStr”).
Example for the primary server:
{
"_id" : 0,
"name" : "server1:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 87986,
"optime" : {
"ts" : Timestamp(1526936047, 1),
"t" : NumberLong(1)
},
"optimeDate" : ISODate("2023-03-29T12:54:07Z"),
"electionTime" : Timestamp(1526848104, 1),
"electionDate" : ISODate("2018-03-29T12:28:24Z"),
"configVersion" : 3,
"self" : true
},
The value of "stateStr" is "PRIMARY", it means server1 is indeed the primary.
For the secondary server2:
"_id" : 0,
"name" : "server2:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 97453,
"optime" : {
"ts" : Timestamp(1526936047, 1),
"t" : NumberLong(1)
},
"optimeDate" : ISODate("2023-03-29T12:54:07Z"),
"electionTime" : Timestamp(1526848104, 1),
"electionDate" : ISODate("2018-03-29T12:28:24Z"),
"configVersion" : 3,
"self" : true
},
"stateStr" is "SECONDARY", means server2 is part of the cluster and secondary as expected. Similar output should be for server3.
By default, servers are checking state (health) of each other every 10 seconds (time period is changeable).
As it was noted before — if the primary server is inaccessible then election process is started.
According to the priority values server2 will become as a new primary — it could be seen in “rs.status()” output: "stateStr" will be changed from "SECONDARY" to "PRIMARY".
For server1 "stateStr" will be “not reachable/healthy” in the same time.
Once server1 is available in the cluster again — the data is replicated from server2 and after that new election is started — server1 becomes as the primary because of its priority.
Configured cluster also allows to perform maintenance on any server without impact on an application because the data is accessible from another server.
Conclusion
These steps are enough to set up basic "replica set" cluster.
More complex configuration could be used including combination of both clusters’ types.
Examples of this kind of configurations will be discussed in following articles.