CSci 551: Fall 2009: Implement the Dynamo Key-value Store
Introduction
Many networking designs are often studied in a simulator because it can be logistically difficult and time-consuming to implement a design in a real system of routers and hosts. The most common kind of simulation is discrete-event simulation.
In this project, you are given an implementation of a discrete-event simulator written in C++. Such a simulator, which is usually implemented as a single UNIX process, mimics the behavior of a network using a discrete-event engine which manages the communication between routers. The crucial aspect of this engine is that it simulates time and will give you a realistic sense of the sequence of events in a real-world implementation.
To understand more about discrete-event simulation, you can search the Web and read the many sites devoted to explaining the concept. You should also download and read the code for the discrete event simulator called netsim.
For this project, you will implement a subset of the Dynamo key-value store, the technology underlying Amazon's shopping cart and other services, in netsim.
There are two parts to the project. Part 1 is due at 5pm on October 9th, 2009. Part 2 is due at 5pm on November 29th, 2009.
The Netsim Simulator
Netsim takes a topology description (and other information described below) as input. You can specify nodes and links in the topology, and the link bandwidths and propagation latencies. Netsim also implements code that lets you send a packet from any node in the network, to any other node using shortest-path routing. Each node has a limited size packet queue (whose size you can configure), and if a packet arrives at a node when this queue is full, that packet is dropped. (In this project, as we describe below, you may not be using this functionality of netsim).
Thus, netsim provides the abstraction of a datagram network, on top of which you can implement several networking protocols and subsystems. In a previous edition of this class, students implemented an almost complete version of TCP on top of this simulator! In general, to implement a specific network functionality, you would use C++ inheritance and define one or more C++ classes that inherit from a base class defined in netsim.
To give you a feel for how to implement a protocol on top of netsim,
we have implemented a simple constant bit-rate transmission protocol:
this protocol sends packets periodically from a specified sender
to a specified receiver.
See the cbr_app directory for how this protocol is implemented using
Netsim.
To get started on the project, we recommend that you follow these steps:
- Download the netsim code and extract it.
-
Print out the code and read it carefully to understand what
each C++ class does. Start from
Scheduler, and readPacketSchedulerandTimer: these three files implement the heart of the discrete-event engine.Topologyimplements the routing algorithms, whileFIFONodeimplements a router with a bounded packet queue. This code is not particularly well documented, but the level of documentation is fairly typical of what you will find when you are working on collaborative projects in industry or academia. - You may also need to familiarize yourself with the C++ Standard Template Library.
-
Look at the
cbr_appdirectory and familiarize yourself with our implementation of the constant bit rate (CBR) transmission protocol. As part of this, you should also understand the configuration language in Netsim. See theexamplessub-directory for examples of how to configure a network using Netsim, and how to start the CBR sender and receiver. -
A good way to get familiar with the code and how the simulator works
is to compile (using
make) and run thecbr_appcode, examine the trace output (you need to figure out how this works), and single step through the code using the Gnu debuggergdb.
Overview
Before getting started on the project, you should read the Dynamo paper thoroughly. Dynamo provides an interface for storing (putting) a key-value pair, and for retrieving (getting) the value given a key. It implements this key value storage system on a collection of servers using consistent hashing for load-balancing (you might also want to feed the Chord paper for the origins of this idea), and implements quorum-based replication for fault tolerance.
In this project, you will be implementing the following subset of Dynamo.
-
System Interface (Section 4.1): Your system should provide the
put()andget()interface calls for storing and retrieving key value pairs. - Partitioning Algorithm (Section 4.2): You will be implementing the partitioning algorithm where each server/node creates a number of virtual nodes (4 in your implementation). Each virtual node assigns itself a position on the ring determined by randomly hashing its identifier, and assumes responsibility for part of ring's identifier space.
- Replication Algorithm (Section 4.3): You will implement the replication algorithm by replicating each put on 5 distinct physical nodes (you can assume that any topology will contain at least five physical nodes).
- You do not need to implement versioning (Section 4.4).
-
Executing get() and put() operations (Section 4.5): you will be
implementing a simplified version of a partition aware strategy,
where every physical node on the ring will be able execute a
get()orput()operation. Each node will contain a map of the entire ring, using which it can contact the appropriate replicas. - Transient failures and quorum (Section 4.5): Your implementation will also support the transient system failures by permitting a read if at least 3 physical replicas have the key requested, and succeeding on a write only if 3 physical replicas are up.
- You will not need to implement hinted handoff (Section 4.6), replica synchronization to tolerate permanent failure (Section 4.7), or membership leaves and joins (Section 4.8).
Part 1
For part 1 of the project, you should implement:
- the system interface
- the partitioning algorithm
- put and get operations
Part 1 is due at 5pm on October 9th, 2009.
Part 2
For part 2 of the project, you should implement:
- replication
- transient failures and quorum.
Part 2 is due at 5pm on November 29th, 2009.
Simplifying Assumptions
Your implementation will make several simplifying assumptions, which will deviate from reality. This deviation will give you a sense for how far you are off from the actual deployed system.
- Topology: You can assume that each physical node has a direct network link to every other physical node in the system. In a practical setting, Amazon implements the service in a data center whose network topology resembles a tree (multiple hops between nodes).
-
Inter-node communication: You can assume that a packet sent
in netsim is reliably delivered. In practice, TCP is used for
communicating between nodes. In your implementation, you can assume
that the rate of
put()andget()operations is low enough that the network is never congested, so that packets are never dropped. -
Node Dynamics: You can assume that nodes join the ring at the very
beginning, before any
put()orget()operations are invoked. In a practical setting, nodes may be added to the network or removed at any time. - Failure Model: You can assume that nodes never fail permanently. However, you will need to implement transient failure, in which a physical node is temporarily "down" and silently drops packets when it is in that state.
- Centralized Map: You can assume that all nodes have a map of the positions on the ring occupied by each virtual node. In practice, this map is usually disseminated using a "gossip" protocol.
Getting Started
To get started on your project, do the following steps:
-
Create a new directory at the same level as
cbr_appandnetsimcalleddynamo. -
Copy
main.cpp,Makefile, andapp_config.cppfromcbr_appintodynamo. You should not need to change the first file, but will need to change the latter two. -
In this
dynamodirectory, you will develop your Dynamo implementation. You will need to add a few files: in general, it would be a good idea to implement each distinct C++ class in a separate file, with its own header. Be sure to appropriately update theMakefile(you need to read about makefiles and how they work; this is left as an exercise for you).
Assuming you have correctly followed the steps above, when you type
% make
in the dynamo directory, your project should compile into one
executable called netsim_app. This program takes the name of a
configuration file as an argument (specified using the -f
flag). In the next section, we describe what you should implement
as part of the configuration file.
Input
Netsim supports one configuration command that is relevant to your
project. The Link(x,y) command specifies that a topology should be
created from node x to y in the simulator. (In general, nodes
nodes are sequentially numbered using positive integers starting
from 1).
In addition, your program will need to support several additional configuration commands:
-
physnode(u_int x)declares a physical node whose identifier isx. This command should create a physical node, and associated virtual nodes which join the ring at the appropriate location. It is assumed that all physical nodes are created before the simulation starts. This command must come before theLinkcommands. -
put(ID id, Time t, char* k, char* v)specifies that a put should be executed at timetof the simulation on virtual nodeid. This put command stores the keykand valuevin Dynamo.vwill be specified as a double-quote delimited string (e.g. "this is a value"). -
get(ID id, Time t, char* k)specifies that a get should be executed at timetof the simulation on virtual nodeid. This get command retrieves the valuevassociated with keyk. -
stop(Time t, ID id)stops the physical node whose ID isid. A stopped node silently drops all received packets. -
resume(Time t, ID x)resumes the physical node whose ID isid. A resumed node responds to all received packets, and assumes the same position in the ring that it had before it was stopped. Thestopandresumecommands simulate transient failure of physical nodes.
Output
You should use Netsim's TRACE facility to print program output. The
TRACE facility is defined in netsim/common.h.
Specifically:
-
When a virtual node adds itself to the ring, you should print a line
of the following form in your program:
TRACE(TRL3, "Virtual node ID %d joined ring location %#x\n", id, loc)where
idis the virtual node identifier (defined in the implementation section below), and the ring location is a 32-bit integer. -
When a virtual node with
idcompletes a put command for keykand valuev, your program should print:TRACE(TRL3, "Virtual node ID %d put key %s value %s at nodes %s\n", id, k, v, n)where
nis a string containing the list of physical nodes at whichkwas stored. -
When a virtual node with
idcompletes a get command for keyk, your program should print:TRACE(TRL3, "Virtual node ID %d get key %s value %s from nodes %s\n", id, k, v, n)where
vis the value retrieved, andnis a string containing the list of physical nodes at whichvwas stored. In general,nwill contain the nodes that satisfy quorum when the value is replicated. -
When a physical node
idis stopped, your program should print:TRACE(TRL3, "Physical node ID %d stopped\n", id) -
When a physical node
idis resumed, your program should print:TRACE(TRL3, "Physical node ID %d resumed\n", id)
Note that the TRACE facility prints out the simulation virtual clock time, so network delays are taken into account in the trace.
Implementation Design
In this section, we describe some requirements that your implementation should satisfy. We also suggest an implementation strategy, but you should feel free to pick a different strategy.
Requirements
- The ring identifier space is 32-bits.
- Physical nodes are sequentially numbered starting from 1. Netsim will send packets only to physical nodes.
-
Virtual nodes must be given an ID based on the associated physical
node. The
i-th virtual node on physical nodejshould have an ID equal to100*j + i. (You can assume that there are fewer than 100 physical nodes in the system). - You will use the SHA-1 hash function implementation provided for mapping the ID or the key to a location on the ring.
Suggested Implementation Structure
-
Packet structure: You will need to add appropriate structure to the
packet header. See how this is done in
cbr_app. -
Physical node class: You will need to create a separate class for a
physical node, derived from
FIFONode. This class must maintain the associated virtual nodes, and must implement transient failure. -
Virtual node class: You will need to create a separate class for a
virtual node, derived from physical node class. This class will
contain the logic for the
putandgetoperations, and will implement the partitioning and quorum algorithms. - Virtual node map class: This class maintains the mapping from virtual node to position on the ring, and can determine which virtual nodes are responsible for which part of ID space. This will be accessed by each virtual node in order to determine where to put or get a key.
This is not a complete list of the functionality you will need to implement. Also, feel free to deviate from these suggestions if you find a better or more convenient way to implement the project.
Submitting The Project
You will upload your project to the class Moodle (follow the ``Submitting the Project'' link).
Follow these steps to prepare your submission. First, remove all
.o and backup files from your dynamo directory. You
should only be left with the Makefile, source and header files.
Then, inside the dynamo directory, execute the following
command:
% tar zcvf ../project.tgz .
This will create a project.tgz file in the parent directory.
Simply upload this file to the moodle.
How to Do Well in the Project
- Follow the instructions exactly.
- Do not deviate from the specified output format; do not change capitalization, do not add or remove extra characters, etc. Basically, you should make sure that we can write a script to check your output.
- Follow the submission instructions exactly.
- Create a few test cases of your own, designed to test different aspects of the program.
- Work through those test cases by hand, to make sure the program's outputs are correct.
- Whenever you change the code (and especially for last minute changes), re-test your code against the test cases. Most bugs are introduced by incremental changes to the code.