CSci 551: Fall 2009: Implement the Dynamo Key-value Store

Introduction

Many networking designs are often studied in a simulator because it can be logistically difficult and time-consuming to implement a design in a real system of routers and hosts. The most common kind of simulation is discrete-event simulation.

In this project, you are given an implementation of a discrete-event simulator written in C++. Such a simulator, which is usually implemented as a single UNIX process, mimics the behavior of a network using a discrete-event engine which manages the communication between routers. The crucial aspect of this engine is that it simulates time and will give you a realistic sense of the sequence of events in a real-world implementation.

To understand more about discrete-event simulation, you can search the Web and read the many sites devoted to explaining the concept. You should also download and read the code for the discrete event simulator called netsim.

For this project, you will implement a subset of the Dynamo key-value store, the technology underlying Amazon's shopping cart and other services, in netsim.

There are two parts to the project. Part 1 is due at 5pm on October 9th, 2009. Part 2 is due at 5pm on November 29th, 2009.

The Netsim Simulator

Netsim takes a topology description (and other information described below) as input. You can specify nodes and links in the topology, and the link bandwidths and propagation latencies. Netsim also implements code that lets you send a packet from any node in the network, to any other node using shortest-path routing. Each node has a limited size packet queue (whose size you can configure), and if a packet arrives at a node when this queue is full, that packet is dropped. (In this project, as we describe below, you may not be using this functionality of netsim).

Thus, netsim provides the abstraction of a datagram network, on top of which you can implement several networking protocols and subsystems. In a previous edition of this class, students implemented an almost complete version of TCP on top of this simulator! In general, to implement a specific network functionality, you would use C++ inheritance and define one or more C++ classes that inherit from a base class defined in netsim.

To give you a feel for how to implement a protocol on top of netsim, we have implemented a simple constant bit-rate transmission protocol: this protocol sends packets periodically from a specified sender to a specified receiver. See the cbr_app directory for how this protocol is implemented using Netsim.

To get started on the project, we recommend that you follow these steps:

  • Download the netsim code and extract it.
  • Print out the code and read it carefully to understand what each C++ class does. Start from Scheduler, and read PacketScheduler and Timer: these three files implement the heart of the discrete-event engine. Topology implements the routing algorithms, while FIFONode implements a router with a bounded packet queue. This code is not particularly well documented, but the level of documentation is fairly typical of what you will find when you are working on collaborative projects in industry or academia.
  • You may also need to familiarize yourself with the C++ Standard Template Library.
  • Look at the cbr_app directory and familiarize yourself with our implementation of the constant bit rate (CBR) transmission protocol. As part of this, you should also understand the configuration language in Netsim. See the examples sub-directory for examples of how to configure a network using Netsim, and how to start the CBR sender and receiver.
  • A good way to get familiar with the code and how the simulator works is to compile (using make) and run the cbr_app code, examine the trace output (you need to figure out how this works), and single step through the code using the Gnu debugger gdb.

Overview

Before getting started on the project, you should read the Dynamo paper thoroughly. Dynamo provides an interface for storing (putting) a key-value pair, and for retrieving (getting) the value given a key. It implements this key value storage system on a collection of servers using consistent hashing for load-balancing (you might also want to feed the Chord paper for the origins of this idea), and implements quorum-based replication for fault tolerance.

In this project, you will be implementing the following subset of Dynamo.

  • System Interface (Section 4.1): Your system should provide the put() and get() interface calls for storing and retrieving key value pairs.
  • Partitioning Algorithm (Section 4.2): You will be implementing the partitioning algorithm where each server/node creates a number of virtual nodes (4 in your implementation). Each virtual node assigns itself a position on the ring determined by randomly hashing its identifier, and assumes responsibility for part of ring's identifier space.
  • Replication Algorithm (Section 4.3): You will implement the replication algorithm by replicating each put on 5 distinct physical nodes (you can assume that any topology will contain at least five physical nodes).
  • You do not need to implement versioning (Section 4.4).
  • Executing get() and put() operations (Section 4.5): you will be implementing a simplified version of a partition aware strategy, where every physical node on the ring will be able execute a get() or put() operation. Each node will contain a map of the entire ring, using which it can contact the appropriate replicas.
  • Transient failures and quorum (Section 4.5): Your implementation will also support the transient system failures by permitting a read if at least 3 physical replicas have the key requested, and succeeding on a write only if 3 physical replicas are up.
  • You will not need to implement hinted handoff (Section 4.6), replica synchronization to tolerate permanent failure (Section 4.7), or membership leaves and joins (Section 4.8).

Part 1

For part 1 of the project, you should implement:

  • the system interface
  • the partitioning algorithm
  • put and get operations

Part 1 is due at 5pm on October 9th, 2009.

Part 2

For part 2 of the project, you should implement:

  • replication
  • transient failures and quorum.

Part 2 is due at 5pm on November 29th, 2009.

Simplifying Assumptions

Your implementation will make several simplifying assumptions, which will deviate from reality. This deviation will give you a sense for how far you are off from the actual deployed system.

  • Topology: You can assume that each physical node has a direct network link to every other physical node in the system. In a practical setting, Amazon implements the service in a data center whose network topology resembles a tree (multiple hops between nodes).
  • Inter-node communication: You can assume that a packet sent in netsim is reliably delivered. In practice, TCP is used for communicating between nodes. In your implementation, you can assume that the rate of put() and get() operations is low enough that the network is never congested, so that packets are never dropped.
  • Node Dynamics: You can assume that nodes join the ring at the very beginning, before any put() or get() operations are invoked. In a practical setting, nodes may be added to the network or removed at any time.
  • Failure Model: You can assume that nodes never fail permanently. However, you will need to implement transient failure, in which a physical node is temporarily "down" and silently drops packets when it is in that state.
  • Centralized Map: You can assume that all nodes have a map of the positions on the ring occupied by each virtual node. In practice, this map is usually disseminated using a "gossip" protocol.

Getting Started

To get started on your project, do the following steps:

  • Create a new directory at the same level as cbr_app and netsim called dynamo.
  • Copy main.cpp, Makefile, and app_config.cpp from cbr_app into dynamo. You should not need to change the first file, but will need to change the latter two.
  • In this dynamo directory, you will develop your Dynamo implementation. You will need to add a few files: in general, it would be a good idea to implement each distinct C++ class in a separate file, with its own header. Be sure to appropriately update the Makefile (you need to read about makefiles and how they work; this is left as an exercise for you).

Assuming you have correctly followed the steps above, when you type

% make

in the dynamo directory, your project should compile into one executable called netsim_app. This program takes the name of a configuration file as an argument (specified using the -f flag). In the next section, we describe what you should implement as part of the configuration file.

Input

Netsim supports one configuration command that is relevant to your project. The Link(x,y) command specifies that a topology should be created from node x to y in the simulator. (In general, nodes nodes are sequentially numbered using positive integers starting from 1).

In addition, your program will need to support several additional configuration commands:

  • physnode(u_int x) declares a physical node whose identifier is x. This command should create a physical node, and associated virtual nodes which join the ring at the appropriate location. It is assumed that all physical nodes are created before the simulation starts. This command must come before the Link commands.
  • put(ID id, Time t, char* k, char* v) specifies that a put should be executed at time t of the simulation on virtual node id. This put command stores the key k and value v in Dynamo. v will be specified as a double-quote delimited string (e.g. "this is a value").
  • get(ID id, Time t, char* k) specifies that a get should be executed at time t of the simulation on virtual node id. This get command retrieves the value v associated with key k.
  • stop(Time t, ID id) stops the physical node whose ID is id. A stopped node silently drops all received packets.
  • resume(Time t, ID x) resumes the physical node whose ID is id. A resumed node responds to all received packets, and assumes the same position in the ring that it had before it was stopped. The stop and resume commands simulate transient failure of physical nodes.

Output

You should use Netsim's TRACE facility to print program output. The TRACE facility is defined in netsim/common.h.

Specifically:

  • When a virtual node adds itself to the ring, you should print a line of the following form in your program:

    TRACE(TRL3, "Virtual node ID %d joined ring location %#x\n", id, loc)

    where id is the virtual node identifier (defined in the implementation section below), and the ring location is a 32-bit integer.

  • When a virtual node with id completes a put command for key k and value v, your program should print:

    TRACE(TRL3, "Virtual node ID %d put key %s value %s at nodes %s\n", id, k, v, n)

    where n is a string containing the list of physical nodes at which k was stored.

  • When a virtual node with id completes a get command for key k, your program should print:

    TRACE(TRL3, "Virtual node ID %d get key %s value %s from nodes %s\n", id, k, v, n)

    where v is the value retrieved, and n is a string containing the list of physical nodes at which v was stored. In general, n will contain the nodes that satisfy quorum when the value is replicated.

  • When a physical node id is stopped, your program should print:

    TRACE(TRL3, "Physical node ID %d stopped\n", id)

  • When a physical node id is resumed, your program should print:

    TRACE(TRL3, "Physical node ID %d resumed\n", id)

Note that the TRACE facility prints out the simulation virtual clock time, so network delays are taken into account in the trace.

Implementation Design

In this section, we describe some requirements that your implementation should satisfy. We also suggest an implementation strategy, but you should feel free to pick a different strategy.

Requirements

  • The ring identifier space is 32-bits.
  • Physical nodes are sequentially numbered starting from 1. Netsim will send packets only to physical nodes.
  • Virtual nodes must be given an ID based on the associated physical node. The i-th virtual node on physical node j should have an ID equal to 100*j + i. (You can assume that there are fewer than 100 physical nodes in the system).
  • You will use the SHA-1 hash function implementation provided for mapping the ID or the key to a location on the ring.

Suggested Implementation Structure

  • Packet structure: You will need to add appropriate structure to the packet header. See how this is done in cbr_app.
  • Physical node class: You will need to create a separate class for a physical node, derived from FIFONode. This class must maintain the associated virtual nodes, and must implement transient failure.
  • Virtual node class: You will need to create a separate class for a virtual node, derived from physical node class. This class will contain the logic for the put and get operations, and will implement the partitioning and quorum algorithms.
  • Virtual node map class: This class maintains the mapping from virtual node to position on the ring, and can determine which virtual nodes are responsible for which part of ID space. This will be accessed by each virtual node in order to determine where to put or get a key.

This is not a complete list of the functionality you will need to implement. Also, feel free to deviate from these suggestions if you find a better or more convenient way to implement the project.

Submitting The Project

You will upload your project to the class Moodle (follow the ``Submitting the Project'' link).

Follow these steps to prepare your submission. First, remove all .o and backup files from your dynamo directory. You should only be left with the Makefile, source and header files. Then, inside the dynamo directory, execute the following command:

% tar zcvf ../project.tgz .

This will create a project.tgz file in the parent directory. Simply upload this file to the moodle.

How to Do Well in the Project

  • Follow the instructions exactly.
  • Do not deviate from the specified output format; do not change capitalization, do not add or remove extra characters, etc. Basically, you should make sure that we can write a script to check your output.
  • Follow the submission instructions exactly.
  • Create a few test cases of your own, designed to test different aspects of the program.
  • Work through those test cases by hand, to make sure the program's outputs are correct.
  • Whenever you change the code (and especially for last minute changes), re-test your code against the test cases. Most bugs are introduced by incremental changes to the code.