Introduction to Cassandra Query Language (CQL)

This post is to show how to insert and retrieve data from Cassandra

April 19, 2018 Pavan Kulkarni

9 minute read

This post gives a quick introduction to Cassandra using CQL.

To install Cassandra on your local please read this - Install Apache Cassandra

Cassandra Query Language

CQL treats the database (Keyspace) as a container of tables. Programmers use cqlsh: a prompt to work with CQL or separate application language drivers. Clients approach any of the nodes for their read-write operations. That node (coordinator) plays a proxy between the client and the nodes holding the data.

Write Operations Every write activity of nodes is captured by the commit logs written in the nodes. Later the data will be captured and stored in the mem-table. Whenever the mem-table is full, data will be written into the SStable data file. All writes are automatically partitioned and replicated throughout the cluster. Cassandra periodically consolidates the SSTables, discarding unnecessary data.

Read Operations During read operations, Cassandra gets values from the mem-table and checks the bloom filter to find the appropriate SSTable that holds the required data.

Commonly used Cassandra terminology

Term	Description
Node	This is where data is stored
Cluster	Collection of nodes
Keyspace	A logical grouping of tables that all have the same replication factor and replication strategy.
Table	A CQL table is a logical grouping of partitions that share the same schema.
Partition	A partition is a collection of sorted CQL rows. A partition is the unit of data access as most queries access a single partition. This is also a physical unit of data that consists of a collection of sorted cells and is identified by a partition key.
Row	A CQL row is an abstract data structure that is a collection of sorted columns. When a partition has multiple rows, the rows are physically sorted on disk per the clustering column(s).
Primary key	The CQL primary key is a composite key that defines the partition key and, optionally, one or more clustering columns. The partition key itself may be defined as a single key or composite key.
Partition key	The partition key is the first component of the primary key and must be unique within a CQL table. The partition key may be defined as a composite key if it is surrounded by parentheses and supplied with a comma separated list of values. This contains a hashed value that provides fast access to a partition and that must be unique within a table.
Clustering column	A clustering column is used to allow a partition to have multiple rows where each row is sorted per the clustering column(s)
Tombstone	The tombstone is cell property that represents a deletion marker. Cassandra will remove cells that contain a tombstone from the partition that is returned to a client during a read.
Timestamp	The timestamp is a cell property that is used internally by Cassandra to keep track of when a cell was inserted or updated. It is also used by Cassandra’s merge process during a read request.
TTL	TTL, or time to live, is a cell property that you can set to define the date/time after which a cell will be automatically deleted by Cassandra.

Demo

Create a keyspace

Pavans-MacBook-Pro:cassandra pavanpkulkarni$ cqlsh
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.2 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh> CREATE KEYSPACE test_keyspace
   ... WITH replication = {'class':'SimpleStrategy', 'replication_factor' : 3};
cqlsh> desc keyspaces

system_schema  system              test_keyspace
system_auth    system_distributed  system_traces

cqlsh> use test_keyspace ;
cqlsh:test_keyspace>

Create table/ columnFamily

cqlsh:test_keyspace> CREATE TABLE courses(
             ...    CID Text Primary Key,
             ...    CNAME Text
             ...    );

cqlsh:test_keyspace> desc tables

courses

cqlsh:test_keyspace>

Insert values in courses table


cqlsh:test_keyspace> INSERT INTO courses (cid, cname) VALUES('CS001', 'Basics of CS');
cqlsh:test_keyspace> INSERT INTO courses (cid, cname) VALUES('CS002', 'DBMS');
cqlsh:test_keyspace> INSERT INTO courses (cid, cname) VALUES('CS003', 'Basics of Java');
cqlsh:test_keyspace> INSERT INTO courses (cid, cname) VALUES('CS004', 'Intro to Big Data');
cqlsh:test_keyspace> INSERT INTO courses (cid, cname) VALUES('CS005', 'Design and Analysis of Algorithms');
cqlsh:test_keyspace> INSERT INTO courses (cid, cname) VALUES('CS006', 'System Software Analysis');
cqlsh:test_keyspace> INSERT INTO courses (cid, cname) VALUES('CS007', 'Network Security');
cqlsh:test_keyspace> INSERT INTO courses (cid, cname) VALUES('CS008', 'Science of Programming');
cqlsh:test_keyspace> INSERT INTO courses (cid, cname) VALUES('CS009', 'Machine Learning - Intro');
cqlsh:test_keyspace> INSERT INTO courses (cid, cname) VALUES('CS010', 'Cloud Computing');
cqlsh:test_keyspace>
cqlsh:test_keyspace> select * from courses;

 cid   | cname
-------+-----------------------------------
 CS002 |                              DBMS
 CS009 |          Machine Learning - Intro
 CS004 |                 Intro to Big Data
 CS008 |            Science of Programming
 CS007 |                  Network Security
 CS003 |                    Basics of Java
 CS006 |          System Software Analysis
 CS001 |                      Basics of CS
 CS005 | Design and Analysis of Algorithms
 CS010 |                   Cloud Computing

(10 rows)
cqlsh:test_keyspace>

Let’s go ahead and create some more complex tables in our test_keyspace

We want to create Students table with User Defined Type (UDT) course_taken. User-defined types (UDTs) can attach multiple data fields, each named and typed, to a single column. The fields used to create a UDT may be any valid data type, including collections and other existing UDTs. Once created, UDTs may be used to define a column in a table.

cqlsh:test_keyspace>
cqlsh:test_keyspace> CREATE TABLE students(
                 ...    id int,
                 ...    name text,
                 ...    courses_registered list<frozen<courses_taken>>,
                 ...    year_graduated text,
                 ...    PRIMARY KEY (id, year_graduated)
                 ...    );
InvalidRequest: Error from server: code=2200 [Invalid query] message="Unknown type test_keyspace.courses_taken"
cqlsh:test_keyspace>
cqlsh:test_keyspace> CREATE Type courses_taken(
                 ...    CID Text,
                 ...    Sem Text
                 ...    );
cqlsh:test_keyspace> CREATE TABLE students(
                 ...    id int,
                 ...    name text,
                 ...    courses_registered list<frozen<courses_taken>>,
                 ...    year_graduated text,
                 ...    PRIMARY KEY (id, year_graduated)
                 ...    );
cqlsh:test_keyspace>
cqlsh:test_keyspace> desc tables;

students  courses

cqlsh:test_keyspace> desc type courses_taken;

CREATE TYPE test_keyspace.courses_taken (
    cid text,
    sem text
);
cqlsh:test_keyspace>

As you can see above, students table is create with compound keys with id as PARTITION KEY and year_grduated as CLUSTERING KEY. A compound primary key consists of more than one column; the first column is the partition key, and the additional columns are clustering keys. The Partition Key is responsible for data distribution across your nodes. The Clustering Key is responsible for data sorting within the partition.

Now, let’s insert more records in Students table using the below insert statements.

INSERT INTO students (id, name, courses_registered, year_graduated) VALUES (001, ‘Tom Riddle’, [{ CID : ‘CS001’, sem: ‘Spring_2001’}, {CID : ‘CS002’, sem : ‘Summer_2001’} , {CID : ‘CS001’, sem : ‘Fall_2001’}], ‘2001’ );

INSERT INTO students (id, name, courses_registered, year_graduated) VALUES (002, ‘Ned Stark’, [{ CID : ‘CS003’, sem: ‘Spring_2002’}, {CID : ‘CS004’, sem : ‘Summer_2002’} , {CID : ‘CS005’, sem : ‘Fall_2002’}], ‘2002’ );

INSERT INTO students (id, name, courses_registered, year_graduated) VALUES (003, ‘Haan Solo’, [{ CID : ‘CS003’, sem: ‘Spring_2002’}, {CID : ‘CS004’, sem : ‘Summer_2002’} , {CID : ‘CS005’, sem : ‘Fall_2002’}], ‘2002’ );

INSERT INTO students (id, name, courses_registered, year_graduated) VALUES (004, ‘Frodo Baggins’, [{ CID : ‘CS009’, sem: ‘Spring_2003’}, {CID : ‘CS010’, sem : ‘Summer_2003’} , {CID : ‘CS004’, sem : ‘Fall_2003’}], ‘2003’ );

INSERT INTO students (id, name, courses_registered, year_graduated) VALUES (005, ‘Sheldon Cooper’, [{ CID : ‘CS004’, sem: ‘Spring_2004’}, {CID : ‘CS005’, sem : ‘Summer_2004’} , {CID : ‘CS003’, sem : ‘Fall_2004’}], ‘2004’ );

INSERT INTO students (id, name, courses_registered, year_graduated) VALUES (006, ‘Tony Stark’, [{ CID : ‘CS009’, sem: ‘Spring_2005’}, {CID : ‘CS006’, sem : ‘Summer_2005’} , {CID : ‘CS004’, sem : ‘Fall_2005’}], ‘2005’ );

INSERT INTO students (id, name, courses_registered, year_graduated) VALUES (007, ‘Stephan Hawkings’, [{ CID : ‘CS004’, sem: ‘Spring_2006’}, {CID : ‘CS005’, sem : ‘Summer_2006’} , {CID : ‘CS003’, sem : ‘Fall_2006’}], ‘2006’ );

INSERT INTO students (id, name, courses_registered, year_graduated) VALUES (008, ‘Cerci Lannister’, [{ CID : ‘CS001’, sem: ‘Spring_2007’}, {CID : ‘CS003’, sem : ‘Summer_2007’} , {CID : ‘CS009’, sem : ‘Fall_2007’}], ‘2007’ );

INSERT INTO students (id, name, courses_registered, year_graduated) VALUES (009, ‘Wonder Woman’, [{ CID : ‘CS006’, sem: ‘Spring_2008’}, {CID : ‘CS007’, sem : ‘Summer_2008’} , {CID : ‘CS009’, sem : ‘Fall_2008’}], ‘2008’ );

INSERT INTO students (id, name, courses_registered, year_graduated) VALUES (010, ‘Hermoine Granger’, [{ CID : ‘CS010’, sem: ‘Spring_2009’}, {CID : ‘CS002’, sem : ‘Summer_2009’} , {CID : ‘CS007’, sem : ‘Fall_2009’}], ‘2009’ );

INSERT INTO students (id, name, courses_registered, year_graduated) VALUES (011, ‘Peter Parker’, [{ CID : ‘CS001’, sem: ‘Spring_2010’}, {CID : ‘CS002’, sem : ‘Summer_2010’} , {CID : ‘CS005’, sem : ‘Fall_2010’}], ‘2010’ );

INSERT INTO students (id, name, courses_registered, year_graduated) VALUES (005, ‘Pika Achu’, [{ CID : ‘CS001’, sem: ‘Spring_2010’}, {CID : ‘CS002’, sem : ‘Summer_2010’} , {CID : ‘CS005’, sem : ‘Fall_2010’}], ‘2010’ );

cqlsh:test_keyspace> select * from students  ;
 id | year_graduated | courses_registered                                                                                         | name
----+----------------+------------------------------------------------------------------------------------------------------------+------------------
  5 |           2004 | [{cid: 'CS004', sem: 'Spring_2004'}, {cid: 'CS005', sem: 'Summer_2004'}, {cid: 'CS003', sem: 'Fall_2004'}] |   Sheldon Cooper
  5 |           2010 | [{cid: 'CS001', sem: 'Spring_2010'}, {cid: 'CS002', sem: 'Summer_2010'}, {cid: 'CS005', sem: 'Fall_2010'}] |        Pika Achu
 10 |           2009 | [{cid: 'CS010', sem: 'Spring_2009'}, {cid: 'CS002', sem: 'Summer_2009'}, {cid: 'CS007', sem: 'Fall_2009'}] | Hermoine Granger
 11 |           2010 | [{cid: 'CS001', sem: 'Spring_2010'}, {cid: 'CS002', sem: 'Summer_2010'}, {cid: 'CS005', sem: 'Fall_2010'}] |     Peter Parker
  1 |           2001 | [{cid: 'CS001', sem: 'Spring_2001'}, {cid: 'CS002', sem: 'Summer_2001'}, {cid: 'CS001', sem: 'Fall_2001'}] |       Tom Riddle
  8 |           2007 | [{cid: 'CS001', sem: 'Spring_2007'}, {cid: 'CS003', sem: 'Summer_2007'}, {cid: 'CS009', sem: 'Fall_2007'}] |  Cerci Lannister
  2 |           2002 | [{cid: 'CS003', sem: 'Spring_2002'}, {cid: 'CS004', sem: 'Summer_2002'}, {cid: 'CS005', sem: 'Fall_2002'}] |        Ned Stark
  4 |           2003 | [{cid: 'CS009', sem: 'Spring_2003'}, {cid: 'CS010', sem: 'Summer_2003'}, {cid: 'CS004', sem: 'Fall_2003'}] |    Frodo Baggins
  7 |           2006 | [{cid: 'CS004', sem: 'Spring_2006'}, {cid: 'CS005', sem: 'Summer_2006'}, {cid: 'CS003', sem: 'Fall_2006'}] | Stephan Hawkings
  6 |           2005 | [{cid: 'CS009', sem: 'Spring_2005'}, {cid: 'CS006', sem: 'Summer_2005'}, {cid: 'CS004', sem: 'Fall_2005'}] |       Tony Stark
  9 |           2008 | [{cid: 'CS006', sem: 'Spring_2008'}, {cid: 'CS007', sem: 'Summer_2008'}, {cid: 'CS009', sem: 'Fall_2008'}] |     Wonder Woman
  3 |           2002 | [{cid: 'CS003', sem: 'Spring_2002'}, {cid: 'CS004', sem: 'Summer_2002'}, {cid: 'CS005', sem: 'Fall_2002'}] |        Haan Solo

(12 rows)
cqlsh:test_keyspace>

We can now run some sample CQL queries to fetch data from Cassandra.

cqlsh:test_keyspace> select * from students where id=8;

 id | year_graduated | courses_registered                                                                                         | name
----+----------------+------------------------------------------------------------------------------------------------------------+-----------------
  8 |           2007 | [{cid: 'CS001', sem: 'Spring_2007'}, {cid: 'CS003', sem: 'Summer_2007'}, {cid: 'CS009', sem: 'Fall_2007'}] | Cerci Lannister

(1 rows)
cqlsh:test_keyspace> select * from students where id=5 order by year_graduated;

 id | year_graduated | courses_registered                                                                                         | name
----+----------------+------------------------------------------------------------------------------------------------------------+----------------
  5 |           2004 | [{cid: 'CS004', sem: 'Spring_2004'}, {cid: 'CS005', sem: 'Summer_2004'}, {cid: 'CS003', sem: 'Fall_2004'}] | Sheldon Cooper
  5 |           2010 | [{cid: 'CS001', sem: 'Spring_2010'}, {cid: 'CS002', sem: 'Summer_2010'}, {cid: 'CS005', sem: 'Fall_2010'}] |      Pika Achu

(2 rows)
cqlsh:test_keyspace>

More on Partition Key and Cluster Keys in the reference section

blog

Introduction to Cassandra Query Language (CQL)

Cassandra Query Language

Commonly used Cassandra terminology

Demo

References

Pavan's Blog

Recent Posts

Introduction to Kafka

Spark Structured Streaming - File-to-File Real-time Streaming (3/3)

Spark Structured Streaming - Socket Word Count (2/3)

Spark Structured Streaming - Introduction (1/3)

Categories

About

Home

About Me

Blog