top of page
Search

Setting Up Apache Cassandra with DC-DR Replication

Updated: Nov 11, 2025

Introduction

Apache Cassandra is a highly scalable, distributed NoSQL database designed to handle large amounts of data across many commodity servers without a single point of failure. Known for its high availability and fault tolerance, Cassandra is widely used in applications that require constant uptime and the ability to handle massive volumes of data.


Advantages of Apache Cassandra:


  • Scalability: Easily add nodes to the cluster without downtime.

  • High Availability: No single point of failure; ensures data is always accessible.

  • Fault Tolerance: Automatic data replication across nodes to ensure redundancy.

  • Performance: Handles large volumes of data with low latency.

  • Flexible Schema: Supports dynamic and flexible schema changes.


What is High Availability?


High availability refers to systems that are operational and accessible for a high percentage of time. In the context of databases like Cassandra, high availability ensures that the database remains available to users even in the event of hardware failures, network issues, or other disruptions.


Role of Replication in Cassandra


Replication in Cassandra plays a crucial role in achieving high availability. Cassandra replicates data across multiple nodes in a cluster, and across multiple data centers (DCs) in a DC-DR setup. This replication ensures that even if one node or an entire data center goes down, the data remains accessible from another replica.


Step-by-Step Guide to Setting Up Cassandra with DC-DR Replication


In this guide, we will set up Apache Cassandra with a DC-DR replication model. We will use two virtual machines (VMs) with the following IPs:


  • Data Center (DC): 192.168.96.74

  • Disaster Recovery (DR): 192.168.96.75


Prerequisites


  • Two VMs with the following specifications:

    • RAM: 3GB

    • Storage: 20GB

    • CPU: 3 cores

  • Operating System: RHEL 8

  • Java: Cassandra requires Java version 11. Ensure Java version 11 is installed on each VM

  • Python

  • Firewall: Open necessary ports (7000, 7001, 7199, 9042, 9160) on each VM.


Steps for Deployment:


  1. Install Java 11 on both DC and DR:


  • Install OpenJDK 11.

sudo yum install java-11-openjdk-devel -y
  • Verify the installation using:

java -version

Download the Apache Cassandra 4.1.5 binary on each VM by using below command.


2. Download and Install Cassandra:

  • Download Cassandra 4.1.5 on both VMs:

  • Extract the downloaded file:

tar -xvzf apache-cassandra-4.1.5-bin.tar.gz
  • Move the extracted files to /opt/cassandra:

sudo mv apache-cassandra-4.1.5 /opt/cassandra

3. Configure Cassandra


Edit the cassandra.yaml configuration file on both VMs to set up DC and DR roles.


  • On the DC and DR Node (192.168.96.74 & 192.168.96.75):

sudo vim /opt/cassandra/conf/cassandra.yaml
  • Set the following parameters:

    Set the cluster name, seed provider, listen address, and snitch properties. Repeat this configuration on each VM, adjusting IP addresses accordingly.

cluster_name: 'cassandra' # Cluster name should be common for all DC and DR VMs
seed_provider:
 - class_name: org.apache.cassandra.locator.SimpleSeedProvide
 parameters:
 - seeds: "192.168.96.74,192.168.96.75" # In seeds section we need to mention all the 2 DC and DR VM IPs
authenticator: PasswordAuthenticator
authorizer: CassandraAuthorizer
listen_address: 192.168.96.74 # Mention the IP address of current VM
rpc_address: 192.168.96.46 # Mention the IP address of current VM
broadcast_rpc_address : 192.168.96.74 # Mention the IP address of current VM
endpoint_snitch: GossipingPropertyFileSnitch

GossipingPropertyFileSnitch: This snitch uses the gossip protocol to learn about the network topology and propagates the location information (data center and rack) to other nodes in the cluster. It reads the data center and rack configuration from the cassandra-rackdc.properties file.

4. Define the Data Centers

In the cassandra-rackdc.properties file, specify the data center and rack information.


  • On the DC and DR Node (192.168.96.74,192.168.96.75):

vim /opt/cassandra/conf/cassandra-rackdc.properties
  • Set the following parameters:

# These properties are used with GossipingPropertyFileSnitch and will
# indicate the rack and dc for this node
dc=dc # For DC VMs we need to mention as dc and for DR VMs we need to mention as dr
rack=rack1

DC:


DR:


5. Set up Environment Variables


  • Open the ~/.bashrc file in a text editor:

vim ~/.bashrc
  • Add the environment variables to the end of the file:

export CASSANDRA_HOME=/opt/cassandra export CASSANDRA_CONF=$CASSANDRA_HOME/conf export CLASSPATH=$CASSANDRA_HOME/lib/*:$CASSANDRA_CONF export JAVA_HOME=/usr/lib/jvm/java-11-openjdk export PATH=$JAVA_HOME/bin:$PATH
  • Apply the changes to your current session:

source ~/.bashrc

6. Open the Required Ports


  • Run the following commands to open the necessary ports for Cassandra in the firewall. These commands will allow traffic through the specified ports in the public zone:

sudo firewall-cmd --zone=public --add-port=7000/tcp --permanent
sudo firewall-cmd --zone=public --add-port=7001/tcp --permanent
sudo firewall-cmd --zone=public --add-port=7199/tcp --permanent
sudo firewall-cmd --zone=public --add-port=9042/tcp --permanent
sudo firewall-cmd --zone=public --add-port=9160/tcp --permanent
  • Reload the Firewall

sudo firewall-cmd --reload
  • Verify the Changes

sudo firewall-cmd --list-all

# This command will show you all the active rules in the public zone, including the newly added ports.

6. Start Cassandra on Both Nodes


  • On both nodes, start Cassandra:

sh cassandra -R 
# Run command in /opt/cassandra/bin Dir.
  • Verify the Cluster Setup

bin/nodetool status

Output Breakdown:


Datacenter: dc


  • Status: UN (Up and Normal) — This indicates that the node is up and functioning normally within the cluster.

  • Address: 192.168.96.74 — The IP address of the node within the "dc" (data center).

  • Load: 104.34 KiB — The amount of data stored on this node.

  • Tokens: 16 — The number of tokens assigned to this node for partitioning data.

  • Owns (effective): 100.0% — The percentage of the keyspace this node is responsible for.

  • Host ID: ce1bc1c1-7eea-40c4-8e12-b226e61c2c4c — A unique identifier for this node within the cluster.

  • Rack: rack1 — The rack this node belongs to within the data center.


Datacenter: dr

  • Similar details are provided for the second data center (dr), where the node at 192.168.96.75 is also up and running, with a load of 75.38 KiB and handling 100.0% of the tokens assigned to it.


7. Connect to Cassandra database


  • Understanding Keyspace Replication Strategies


In Cassandra, a keyspace is a namespace that defines how data is replicated on different nodes. The replication strategy defines how the replicas are placed across the nodes. Two common strategies are:


  1. SimpleStrategy: Suitable for a single data center. Not recommended for production environments.

  2. NetworkTopologyStrategy: Designed for multiple data centers. Recommended for production environments, especially when dealing with DC-DR setups.


  • Connect to Cassandra Using cqlsh


To create a keyspace, you need to connect to the Cassandra cluster using cqlsh, the command-line interface for Cassandra Query Language (CQL).


cqlsh <Cassandra_Node_IP> -u <username> -p <password>

For example:

cqlsh 192.168.96.74 -u admin -p cassandra@2024

Replace <Cassandra_Node_IP>, <username>, and <password> with the appropriate values.


  • Create the Keyspace with DC-DR Replication


Now, create the keyspace using the CREATE KEYSPACE command. In this example, we'll create a keyspace named mykeyspace with replication across two data centers: DC and DR. We assume there are three replicas in each data center.

CREATE KEYSPACE mykeyspace WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'DC': 1,
  'DR': 1,
} AND durable_writes = true;
  1. class: Defines the replication strategy. In this case, NetworkTopologyStrategy is used for multiple data centers.

  2. DC: Specifies the number of replicas in the primary data center (DC).

  3. DR: Specifies the number of replicas in the disaster recovery data center (DR).


  • Verify the Keyspace Creation


After creating the keyspace, you can verify it by listing all keyspaces or describing the specific keyspace:


DESCRIBE KEYSPACE mykeyspace;

This command will display the keyspace configuration, including the replication strategy and the number of replicas in each data center.


  • Create Tables Within the Keyspace


Once the keyspace is created, you can create tables within it. Here’s an example of creating a table named emp_details within the mykeyspace keyspace:

USE mykeyspace;

CREATE TABLE emp_details (
  emp_id int PRIMARY KEY,
  emp_name text,
  emp_city text
);

  1. emp_id: The primary key for the table, uniquely identifying each row.

  2. emp_name and emp_city: Additional columns in the table.


Step 6: Insert Data into the Table


You can now insert data into the table and it will be replicated according to the strategy defined in the keyspace:

INSERT INTO emp_details (emp_id, emp_name, emp_city) VALUES (1, 'John', 'Sam');

Step 7: Verify Data Replication


To ensure that the data is properly replicated, you can connect to different nodes in both the DC and DR data centers and query the table:


SELECT * FROM emp_details ;


If the replication is correctly set up, you should see the same data across all nodes.


This setup ensures that your data is resilient and highly available across multiple data centers, providing fault tolerance and disaster recovery capabilities.

 
 
 

Comments


bottom of page