Redis: What, Why, How

By Ivaylo Pavlov (26 April 2020)

Remote Dictionary Server (Redis) was created by Salvatore Sanfilippo in 2009. It was born out of the necessity to store viewership metrics for a few web sites and display them in real-time in a web page.

It was originally prototyped in TCL. After a successful proof of concept, he rewrote it in C, added a fork-based persistence feature and open-sourced it on GitHub.


The earliest adopters were GitHub and Instagram.
 

In 2011, Ofer Bengal created Redis Labs, the current lead sponsor company behind Redis. In 2015, Salvatore, joined as an open source development lead.

As of May 2020, it is used by GitHub, Twitter, StackOverflow, Flickr and many others, with adoption heavily driven by the cloud providers and enterprise offerings.

A brief history

The landscape of storage systems

Name Storage Type Storage Options Query types Extras
Redis



 
In-memory NoSQL Database Strings (Text, Numbers, Datetime, Boolean), Lists, Sets, Hashes, ZSets, Streams, Customizable Types CRUD Commands
Bulk operations
Partial transaction support
Master/Follower replication, Disk persistence, Sharding, Publish/Subscribe,
Stored procedures,
Pluggable module system
Memcached



 
In-memory Key-Value Cache Key-Value mappings (Strings) CRUD Commands Multithreaded server
PostgreSQL



 
On-disk Relational Database
(RDBMS)
Tables of rows and columns (Text, Numbers, Datetime, Boolean), views over tables, customizable types (XML, JSON, etc) CRUD Commands
Custom stored procedures
Full transactional support
ACID operations, Master/Follower replication, Multi-Master replication,
Extensible
MongoDB


 
On-disk NoSQL Database Tables of schemaless Binary JSON (BSON) documents (Text, Numbers, Datetime, Bool) CRUD Commands
Conditional queries
Full transactional support
ACID operations,
Map-reduce support, Master/Follower replication,
Sharding, Spatial indices

The problems it solves

Web being the primary use cases

  • Session Data Storage (speed, auto-expiring keys)
  • Full page cache
  • Online store (speed, transactions)
  • Any type of user metrics (viewership, location)
  • Rankings and Leaderboards
  • Search (support for inverted indices, scoring)
  • Limited aggregations (Min, Max, Medians, Mean)
  • Generic key-value store

Functionality is closer to a NoSQL database than just an in-memory cache

It's really, really fast

Everything is in-memory

Sticks to the basics - data structures are standard C implementations

Below are some benchmarks & comparisons, for the caveats refer to the links

More benchmarks on redis.io/topics/benchmarks

JSON Blob Redis PostgreSQL
Get 0.53ms 8.66ms 16x
Set 0.44ms 8.59ms 20x
1mil GET/SET Redis Memcached
User time 8.95s 8.64s
System time 20.59s 19.37s
Redis Ops/sec Xeon E5520 (2.27Ghz)
Set 552,028.75
Get 707,463.75
List Push 767,459.75
List Pop 770,119.38

Data structures

Type Contains
String Strings (Encoding Agnostic), Integers (32/64bit), Floats (IEEE 754)
List Linked list of Strings
Set Unordered collection of unique strings
Hash Unordered hash table of key-values
ZSet (Sorted Set) Ordered map of string to float, sorted by score
Stream Append-only log, different consumers for one data stream (Kafka-like)
Hyperloglogs Counts unique items in a space efficient manner (Bloom filter-like)
Bitmaps String (char[]) with bit-oriented commands
Geospatial Indices Encodes latitude and longitude (ZSet with Geohash algorithm)

Data structures (Continued)

// String Representation (Text, Integer, Float)
{
   "key_to_text": "Hello World!",
   "key_to_int": 14,
   "key_to_float": 3.34,
}

// List of Integers
{
   "key_to_list": [1,2,3,4]
}

// Set of Text Strings
{
   "key_to_set": {"foo","bar","baz"}
}
// Hashmap Representation
{
   "key_to_hashmap": {
     "some_text": "hello world",
     "a_number": 42
   } 
}

// ZSet Representation
{
   "key_to_highscore": {
     "Ivo": 10.99999,
     "John": 9.8888,
     "Peter": 7.8888
   }
}

A common misconception is that inside a hash, you can have another data structure like a list or set. Unfortunately, you can't nest data structures in Redis.

  • A workaround is serializing/deserializing on client-side and store as a string
  • Another alternative is to store just a separate key, but requires redesign
  • There are other available solutions resolved with modules (discussed later)

Publish/Subscribe offering

Redis system reliability

  • Before: If a subscribed client is a slow consumer, would cause Redis to hold a large outgoing buffer. If the buffer grows too large, Redis slows down or the OS would kill the Redis process.
  • Today: Redis will disconnect the subscriber based on a configured buffer limit threshold.

Data transmission reliability

  • If a there's a network failure and a client is disconnected and a message is sent before it can reconnect, the client will never see this message, in short - at most once delivery guarantee.

2 Shortcomings

Clients can subscribe to channels and listen for published messages on these channels.

TL;DR, this is not the strong side of Redis, for a proper pub/sub system,
you should be looking either at Redis Stream or Apache Kafka.

The interface

A few examples

Everything is commands

The entire list is available on redis.io/commands

//Adding a string with key "ivo" and value "1"
SET ivo 1

// Getting the created key: 
GET ivo
1

//Appending to a list: 
RPUSH mykey a b c d

// Indexing a list: 
LINDEX mykey 0
'a'

You can try online at try.redis.io

Transactions

All commands are atomic, ie. change is visible to all clients immediately.

However, there's partial support for transactions
They are not the same as transactions in relational databases.

  • Cannot be rolled back
  • Optimistic locking

It's just a collection of commands that are executed without interruption, wrapped between MULTI and EXEC

MULTI
COMMAND 1
COMMAND 2
....
EXEC

1) Main goal is it to remove race conditions
2) Secondary use case is reduced server and client round trips

Client integrations

The full list is available on redis.io/clients

Python

NodeJS

C++

C#

Lua

Scala

OCaml

C

Java

Rust

Ruby

PHP

The level of client integration speaks volumes about popularity and amount of problems it solves in all aspects of software engineering.

2 ways of storing data on disk, both compressable

  • Weak persistence: Snapshot
    - Fork and Copy on a certain interval to a .rdb file
    - Faster to spawn a copy:
    • Async Background copy of 50 GiB (~20min)
    • Sync Foreground copy of 50 GiB (~5min)
  • Strong persistence: Append-only file (AOF)
    - Log file of changes
    - Minimizes data loss to <1 second and faster to persist on disk
    - Slower to spin up for large log files


If Redis runs out of memory will start using SWAP and performance will degrade.

Data persistence & recovery

Fully discussed on redis.io/topics/persistence

  • Single Master doing writes and propagating changes to followers
  • No master-master replication

Replication

1) Master starts a background snapshot and starts holding a backlog of writes since snapshot begin time

2) The follower gets wiped out entirely

3) Follower starts syncing to the snapshot

4) Once follower is synced, master starts sending the writes backlog to the follower

5) Master and follower are up to date

The replication process in a nutshell

Diskless replication

During a resync, the snapshot file (*.rdb) is written to disk then fetched from disk by the replica, if your disk is slow, replication speed suffers. Diskless replication is directly streaming the file over the wire to the replica, skipping the disk, by increasing the replication speed, alleviates load on the master.

Replication (Continued)

Followers can have their own followers resulting in replication chaining

This is useful, if you have too many followers and want to avoid slowing down your master by doing snapshots constantly, as each new follower joining kicks off a creation of a new snapshot. They all sync up to the same master snapshot.

  • Monitoring - constant health checks of the instances
  • Notification - if something fails, alert the stakeholders
  • Automatic Failover - if master fails, elects a follower as the new master and broadcasts the change
  • Configuration provider - authority for client service discovery, clients connect to a sentinel, which makes the failover transition seamless for end users

Provides high availability for Redis as a distributed system

Sentinel

Sentinel Configuration Advice

Always deploy multiple Sentinel instances:

  • A consensus needs to be achieved to do a failover leading to less false positives
  • Sentinel is operational, even if not all sentinel processes are working,
  • Not a single point of failure
  • Every key is part of a hash slot. There are 16384 hash slots.
  • Partition is assigned by CRC16(KEY) % 16384
  • This allows for easier addition and removal of nodes, by redistributing only part of the keys among the new number of shards.

Redis Cluster offers automatic sharding across nodes

Increases cluster resiliency in case of node failure

Sharding

Cache keys on client-side

  • Latency reduction
  • Fewer server-client round trips
  • Can do with fewer Redis nodes
  • With RESP3 Invalidation Messages and GET/SETs in the same channel

The server keeps a list of client-request keys, so it knows which clients to publish key invalidation message to upon value change.

Client-side caching

Key Clients caching the key
foo A, B
bar C
baz A, D

Inefficient when many keys with many client connections that fetch millions of keys. It has a large memory footprint on server-side

Keeps an Invalidation Table with limited number of caching slot, to improve server-side memory overhead.

Pluggable Modules System

  • Created because of constant scope creep, a common occurrence for a successful open source project, available since Redis 4
  • Modules are written in C
  • SDK is provided on GitHub
  • Also offers Lua scripting since Redis 2.6

Ranked Popular Modules on https://redis.io/modules

  • A few popular modules
    • neural-redis - Trainable neural networks as a native data type
    • RediSearch - Full-text search over Redis
    • rediSQL - Full SQL capabilities embedding SQLite
    • RedisJSON - JSON as a native data type for Redis
      • ​Solves the nested structures hashmap problem discussed earlier

What's next (version 6)

  • SSL on all channels
  • Access Control List (ACL) support - define users that can run only certain commands and/or can only access only certain keys patterns.
  • New APIs for the modules system - store arbitrary module private data in RDB files, server event hooks, capture and rewrite commands executions, block clients on keys and others.
  • Early server-side support for client-side caching. More here.
  • RESP3 Protocol with more semantical replies
  • Diskless replication is now supported even on replicas: a replica is now able, under certain conditions the user can configure, to load the RDB in the first synchronization directly from the socket to the memory.
  • Threads to handle I/O, allowing to serve 2 times as many operations per second in a single instance when pipelining cannot be used.

Sources

NB! Some of the concepts mentioned are still subject to change, like client-side caching implementation, which is in its preliminary stage.
Redis is significantly better documented than Memcached. If you've found anything you would like to gain in-depth understanding of, go for the official documentation.

Thank you for listening!