Saturday, June 1, 2013

Cassandra Performance Tuning

In my previous post, I discussed about how to stress test Cassandra. In this post, I will discuss on some easy steps to tune-up its performance. I'm a big fan of Cassandra. It is optimized for very fast and highly available data write. There are so many things you can do to optimize its write and read performance further. But today, I will only discuss on some major and easy tune-up steps which you can apply easily.


Dedicated Commit Log Disk: I think this is the first tune-up you may want to try as it gives you a significant performance improvement. But before changing commit log destination it would be better to know it gives performance boost. Cassandra write operations are occurred on a commit log on disk and then to an in-memory table structure called Memtable. When thresholds are reached, that Memtable is flushed to a disk in a format called SSTable. So if you separate out Commit Log locations, it will isolate Commit Log I/O traffics from other Cassandra Reads, Memtables and SSTables traffics. Remember, after the flush, the Commit Log is no longer needed and is deleted. So the Commit Log disk doesn't need to be large. It just need to be in the size where it can holds Memtable data before its flushed. You can follow the following steps to change commit log location for Cassandra.

Step#1: Mount a separate partition for commit log
Step#2: Make sure you give expected ownership and access on that drive
Step#3: Edit Cassandra configuration file which can be found at conf/cassandra.yaml. You will find a property "CommitLogDirectory", update it based on your mount location. For my case, it will be:
CommitLogDirectory: /mnt/commitlog
Step#4: Restart your Cassandra cluster.


Increasing Java Heap Size: Cassandra runs on JVM. So you might face out of memory issues when you run a heavy load on Cassandra. There is also a rule of thumb about how you want to keep your heap size.
  • Heap Size = 1/2 of System Memory when System Memory < 2GB
  • Heap Size = 1GB when System Memory >= 2GB and <= 4GB
  • Heap Size = 1/4 of System Memory(but not more than 8GB) when System Memory >4GB
Remember, just a larger heap size might not give you a performance boost. So a well-tuned Java heap size is very important. To change the Java heap size, you need to update cassandra-env.sh file and then restart Cassandra cluster again. If you are using Opscenter, you should see the updated heap size on one of the Opscenter's metrics.


Tune Concurrent Reads and Writes: Staged Event-Driven Architecture(SEDA) is used for implementing Cassandra. It breaks the application into stages. Concurrent readers and writers control the maximum number of threads allocated to a particular stage. So having an optimal concurrent reads and concurrent writes value will improve Cassandra performance. But raising these values beyond the limit will decrease Cassandra performance. These values are highly tied with CPU cores of the system. As like, Java heap size, there is also a rule of thumb about how to select these values:
  • Concurrent Reads: 4 concurrent reads per processor core
  • Concurrent Writes: Most of the time you do not need it as write is usually fast. If needed, you can set the value to equal or higher than the concurrent reads.
To change the value, you need to update conf/cassandra.yaml configuration file. There are two parameters present for these two: ConcurrentReaders and ConcurrentWriters. Update those values based on your system and restart Cassandra to take the effect.


Tune-Up Key Cache: For each of the column families, key cache holds the location of row keys in memory. Since keys are usually small, it can store a large cache without using much memory. Each cache hit results in less disk activity. 200000 is the default key cache size of Cassandra and its enabled by default. You can alter the default value by following:


You can monitor key cache performance by using nodetool cfstats command.



Tune-Up Row Cache: In Cassandra, row cache is disabled by default. Row cache holds the entire content of the date in memory. So a column family with large rows could easily consume system memory and could impact Cassandra performance, that's why its disabled by default and should be remain disabled in most of the cases. But if your column data is too small then using row cache will significantly improve performance as row cache keeps the most accessed rows hot in memory. To enable row cache, you can alter your column family and can pass number of rows for row cache.

You can also monitor it by using nodetool cfstats command like above (watch for ."Row cache hit rate").


Conclusion: As I said early, these are only some of the tune-up steps, there are more (high performing RAID level, file system optimization, disabling swap memory, memory mapped disk modes and so on). But I gave you something you can start with, once you find out improved Cassandra performance you can try the rest of the tuning. Cassandra is highly scalable and scaling up is done by enhancing each node (more RAM, high network throughput, SSD, disk size, etc). Remember, if you are using AWS EC2 instance do not expect much performance improvement if you are using medium or small type instance as they are not optimized for better I/O or network, use xlarge+ instance instead.

And finally, DO NOT forget to check the Cassandra Performance and Scalability slides by Adrian Cockcroft.


Note: For privacy purpose, I had to modify several lines on this post from my original post. So if you find something is not working or facing any issues, please do not hesitate to contact me :)

33 comments:

  1. The blog gave me idea about cassandra performance tunning Thanks for sharing it
    Hadoop Training in Chennai

    ReplyDelete
  2. The actual time and effort taken to create this wonderful article were really great and I am really very much impressed with this blog keep updating us...
    Best Online Software Training Institute | Cassandra Training

    ReplyDelete
  3. Really I Appreciate The Effort You Made To Share The Knowledge about cassandra performance tuning. This Is Really A Great Stuff For Sharing. Keep It Up . Thanks For Sharing.

    Cloud Training
    Cloud Training in Chennai

    ReplyDelete
  4. Casino is not just a game but a lifestyle, come in, play and be stylish. roulette online The best casino is only on BGAOC and nowhere else.

    ReplyDelete
  5. I really like this site certain best online casino games to win money I sat there all day figured out that yes how I really like this site, I sat there all day figured out that yes and now I do not worry that someone will deceive me

    ReplyDelete
  6. Really very great information to be provided and the All points discussed were worth reading and i’ll surely work with them all one by one.
    Best C and C++ Programming Training Academy in Kanchipuram

    ReplyDelete
  7. Awesome post. Really you are shared very informative concept... Thank you for sharing. Keep on updating...

    Best Java Training Academy in Kanchipuram

    ReplyDelete
  8. Thank you for the link on Adrian Cockroft's slides about Cassandra Performance and Scalability, it turned to be a piece of very useful information.

    ReplyDelete
  9. woori casino in vegas - Armah Hospitality Clinic
    woori casino in vegas. 우리 카지노 총판 모집 Our Васках - woori 바카라 노하우 casino in vegas is a 우리 카지노 먹튀 casino located near the Wynn Las Vegas 메리트 카지노 쿠폰 and is open daily 24 바카라 사이트 hours.

    ReplyDelete
  10. Casino Finder - Find Casinos Near You (2021)
    Find Casinos netteller Near You (2021) 윈벳 with 7 포커 Local 승인전화없는 토토 Casinos Near You (2021). ✓ Find Casinos Near You (2021) 해외 배팅 with Local Casinos Near You (2021).

    ReplyDelete
  11. It is an excellent blog Thank you for providing important information and I am searching for the same information, Thank you for sharing good content.
    SAP Training in Bangalore

    ReplyDelete