- Layered Protocols
- Low-level layers
- Transport layer
- Application layer
- Middleware layer
Basic networking model
ISO OSI model
- Drawbacks:
- Focus on message-passing only
- Often unneeded or unwanted functionality
- Low-level layers
Recap
- Physical layer: contains the specification and implementation of
bits, and their transmission between sender and receiver
- Data link layer: prescribes the transmission of a series of bits into
a frame to allow for error and flow control
- Network layer: describes how packets in a network of computers
are to be routed.
Observation
For many distributed systems, the lowest-level interface
is that of the network layer.
- Transport Layer
Important
The transport layer provides the actual communication facilities for
most distributed systems.
Standard Internet Procotols:
- TCP: connection-oriented, reliable, stream-oriented
communication
- UDP: unreliable (best-effort) datagram communication
Note
IP multicasting is often considered a standard available service (which
may be dangerous to assume).
Middleware Layer
Observation
Middleware is invented to provide common services and protocols
that can be used by many different applications
- A rich set of communication protocols
- (Un)marshaling of data, necessary for integrated systems
- Naming protocols, to allow easy sharing of resources
- Security protocols for secure communication
- Scaling mechanisms, such as for replication and caching
Note
What remains are truly application-specific protocols...
such as?
- Types of communication
We can view the middleware as an additional service in
client server computing:
(Consider, for example an email system.)
Traditional Client-Server
|
Client-server with Middleware
|
Distinguish:
- Transient versus persistent communication
- Asynchrounous versus synchronous communication
Transient versus persistent:
- Transient communication: Comm. server discards message when
cannot be delivered at the next server, or at the receiver.
- Persistent communication: A message is stored at a communication
server as long as it takes to deliver it.
Asynchronous versus synchronous:
- Asynchronous communication: A sender continues immediately after
it has submitted the message for transmission.
- Synchronous communication: The sender is blocked until its request
is known to be accepted. There are three places that synchronization can take place
( see Figure above ):
- At request submission
- At request delivery
- After request processing
Client/Server
Some observations
Client/Server computing is generally based on a model of transient
synchronous communication:
- Client and server have to be active at time of commun.
- Client issues request and blocks until it receives reply
- Server essentially waits only for incoming requests, and
- subsequently processes them
Drawbacks of synchronous communication
- Client cannot do any other work while waiting for reply
- Failures have to be handled immediately: the client is waiting
- The model may simply not be appropriate (mail, news)
Messaging
Message-oriented middleware ( MOM )
Aims at high-level persistent asynchronous communication:
- Processes send each other messages, which are queued
- Sender need not wait for immediate reply, but can do other things
- Middleware often ensures fault tolerance
- Remote Procedure Call (RPC)
Basic RPC operation
Observations
- Application developers are familiar with simple procedure model
- Well-engineered procedures operate in isolation (black box)
- There is no fundamental reason not to execute procedures on
separate machine
Conclusion
Communication between caller &
callee can be hidden by using
procedure-call mechanism.
1 Client procedure calls client stub.
2 Stub builds message; calls local OS.
3 OS sends message to remote OS.
4 Remote OS gives message to stub.
5 Stub unpacks parameters and calls
server.
|
|
6 Server returns result to stub.
7 Stub builds message; calls OS.
8 OS sends message to client's OS.
9 Client's OS gives message to stub.
10 Client stub unpacks result and returns to
the client
|
RPC: Parameter passing
Parameter marshaling
There's more than just wrapping parameters into a message:
- Client and server machines may have different data
representations (think of byte ordering)
- Wrapping a parameter means transforming a value into a
sequence of bytes
- Client and server have to agree on the same encoding:
- How are basic data values represented (integers, floats, characters)
- How are complex data values represented (arrays, unions)
- Client and server need to properly interpret messages,
transforming them into machine-dependent representations.
RPC parameter passing: some assumptions
- Copy in/copy out semantics: while procedure is executed, nothing can
be assumed about parameter values.
- All data that is to be operated on is passed by parameters. Excludes
passing references to (global) data.
Conclusion
Full access transparency cannot be realized.
Observation
A remote reference mechanism enhances access transparency:
- Remote reference offers unified access to remote data
- Remote references can be passed as parameter in RPCs
Asynchronous RPCs
Essence
Try to get rid of the strict request-reply behavior, but let the client
continue without waiting for an answer from the server.
(a) Traditional RPC
(b) Asynchronous RPC ( no returned result required )
Deferred Synchronous RPCs
Variation
Client can also do a (non)blocking poll at the server to see whether
results are available.
RPC in Practice
Client-to-server binding (DCE)
Issues(1) Client must locate server machine, and (2) locate the server.
- Message-Oriented Communication
- Transient Messaging
- Message-Queuing System
- Message Brokers
- Example: IBM Websphere
Transient messaging: sockets
Berkeley socket interface
Message-Queuing Model
Loosely-coupled communications using Queues.
Sender and receiever can execute completely independent of each other.
Essence
Asynchronous persistent communication through support of
middleware-level queues. Queues correspond to buffers at
communication servers.
Basic interface to a queue in message-queuing system
|
PUT | Append a message to a specified queue |
GET | Block until the specified queue is nonempty, and remove
the first message |
POLL | Check a specified queue for messages, and remove
the first. Never block |
NOTIFY | Install a handler to be called when a message is put
into the specified queue |
Message Broker
Observation
Message queuing systems assume a common messaging protocol: all
applications agree on message format (i.e., structure and data
representation) i.e. the sender needs to have its outgoing messages
in the same format as that of the receiver
Message broker
Centralized component that takes care of application heterogeneity in
an MQ system:
- Transforms incoming messages to target format
- Very often acts as an application gateway
- May provide subject-based routing capabilities => Enterprise
Application Integration ( publish/subscribe )
IBM's WebSphere MQ
Basic concepts:
- All queues are managed by queue managers
- Application-specific messages are put into, and removed from
queues
- Queues reside under the regime of a queue manager
- Processes can put messages only in local queues, or through an
RPC mechanism
Message transfer:
- Messages are transferred between queues
- Message transfer between queues at different processes, requires
a channel
- At each endpoint of channel is a message channel agent
- Message channel agents (MCAs ) are responsible for:
- Setting up channels using lower-level network communication
facilities (e.g., TCP/IP)
- (Un)wrapping messages from/in transport-level packets
- Sending/receiving packets
- Channels are inherently unidirectional
- Automatically start MCAs when messages arrive
- Any network of queue managers can be created
- Routes are set up manually (system administration)
Routing
By using logical names, in combination with name resolution to local queues,
it is possible to put a message in a remote queue
Entry in a routing table: (destQM, sendQ)
Local alias for queue manager names is used to improve management flexibility.
- Stream-oriented communication
- Support for continuous media
- Streams in distributed systems
- Stream management
Continuous media
Observation
All communication facilities discussed so far are essentially based on a
discrete, that is time-independent exchange of information
Continuous media
Characterized by the fact that values are time dependent:
- Audio
- Video
- Animations
- Sensor data (temperature, pressure, etc.)
Transmission modes
Different timing guarantees with respect to data transfer:
- Asynchronous: no restrictions with respect to when data is to be
delivered
- Synchronous: define a maximum end-to-end delay for individual
data packets
- Isochronous: define a maximum
end-to-end delay and maximum delay variance
(jitter is bounded)
Stream
Definition
A (continuous) data stream is a connection-oriented communication
facility that supports isochronous data transmission.
Some common stream characteristics
- Streams are unidirectional
- There is generally a single source, and one or more sinks
- Often, either the sink and/or source is a wrapper around hardware
(e.g., camera, CD device, TV monitor)
- Simple stream: a single flow of data, e.g., audio or video
- Complex stream: multiple data flows, e.g., stereo audio or
combination audio/video
Streams and QoS
Essence
Streams are all about timely delivery of data. How do you specify this
Quality of Service (QoS)? Basics:
- The required bit rate at which data should be transported.
- The maximum delay until a session has been set up (i.e., when an
application can start sending data).
- The maximum end-to-end delay (i.e., how long it will take until a
data unit makes it to a recipient).
- The maximum delay variance, or jitter.
- The maximum round-trip delay.
Enforcing QoS
Observation
There are various network-level tools, such as differentiated services
by which certain packets can be prioritized.
Also
Use buffers to reduce jitter:
Problem
How to reduce the effects of packet loss (when multiple samples are in
a single packet)?
The effect of packet loss in (a)noninterleaved transmission and
(b) interleaved transmission
Stream synchronization
Problem
Given a complex stream, how do you keep the different substreams in
synch?
Example
Think of playing out two channels, that together form stereo sound.
Difference should be less than 20-30 μsec!
Alternative
Multiplex all substreams into a single stream, and demultiplex at the
receiver. Synchronization is handled at multiplexing/demultiplexing
point (MPEG).
Time-division multiplexing
- Multicast communication
Multicast communication
- Application-level multicasting
- Gossip-based data dissemination
Application-level multicasting ( ALM )
Essence
Organize nodes of a distributed system into an overlay network and use that
network to disseminate data.
Two approaches in organizing the network:
- a tree: unique paths between 2 nodes
- a mesh: multiple paths between 2 nodes (more robust)
Chord-based tree building
1 Initiator generates a multicast identifier ( mid ).
2 Lookup succ(mid), the node responsible for mid. ( see also previous chapter )
promote the node to become the root of the tree
3 Request is routed to succ(mid), which has become the root.
4 If P wants to join, it sends a join request to the root.
5 When request arrives at Q:
- Q has not seen a join request before, it becomes a forwarder for the group;
P becomes child of Q. Join request continues to be forwarded.
- Q knows about tree; P becomes child of Q. No need to forward
join request anymore.
ALM: Some Costs
- Link stress: How often does an ALM message cross the same
physical link? Example: message from A to D needs to cross
(Ra,Rb) twice.
- Stretch: Ratio in delay between ALM-level path and network-level
path. Example: messages B to C
- B --> Rb --> Ra --> Rc --> C (total cost =59)
- B --> Rb --> Rd --> Rc -- C (total cost = 47)
=> stretch = 59/47 = 1.255.
Epidemic Algorithms
- General background
- Update models
- Removing objects
Basic idea
Assume there are no write conflicts:
- Update operations are performed at a single server
- A replica passes updated state to only a few neighbors
- Update propagation is lazy, i.e., not immediate
- Eventually, each update should reach every replica
Two forms of epidemics
- Anti-entropy: Each replica regularly chooses another replica at random,
and exchanges state differences, leading to identical states at both
afterwards
- Gossiping: A replica which has just been updated (i.e., has been
contaminated), tells a number of other replicas about its update
(contaminating them as well).
Anti-entropy
Principle operations:
- A node P selects another node Q from the system at random.
- Push: P only sends its updates to Q
- Pull: P only retrieves updates from Q
- Push-Pull: P and Q exchange mutual updates (after which they
hold the same information).
Observation
For push-pull it takes O(log(N)) rounds to disseminate updates to all
N nodes (round = when every node as taken the initiative to start an
exchange).
Gossiping
Basic model
A server S having an update to report, contacts other servers. If a
server is contacted to which the update has already propagated, S
stops contacting other servers with probability 1/k.
Observation
If s is the fraction of ignorant servers (i.e., which are unaware of the
update), it can be shown that with many servers
s = e-(k+1)(1-s)
Note
If we really have to ensure that all servers are eventually updated,
gossiping alone is not enough
Deleting Values
Fundamental problem
We cannot remove an old value from a server and expect the removal
to propagate. Instead, mere removal will be undone in due time using
epidemic algorithms
Solution
Removal has to be registered as a special update by inserting a death
certificate
- Naming
- Naming Entities
- Names are used to denote entities in a distributed system.
To operate on an entity, we need to access it at an access point.
Access points are entities that are named by means of an address.
- A location-independent name for an entity E, is independent from the
addresses of the access points offered by E.
- Identifier
A name with the following properties:
- Each identifier refers to at most one entity
- Each entity is referred to by at most one identifier
- An identifier always refers to the same entity (prohibits reusing
an identifier)
- Flat Naming
Given an essentially unstructured name (e.g., an identifier), how can
we locate its associated access point?
- Simple solutions:
- broadcasting: cannot scale beyond LAN
- Forwarding pointers: When an entity moves, it leaves behind a pointer to next locationi
- Home-based approaches
Use a home location to keep track of the current location of an entity.
- Distributed Hash Tables (DHT) (e.g. Chord system)
Organize many nodes into a logical ring:
- Each node is assigned a random m-bit identifier.
- Every entity is assigned a unique m-bit key.
- Entity with key k associates with node with smallest id ≥ k
( called its successor, denoted by succ(K) ).
- Nonsolution: Let node p keep track of succ ( p + 1 ) as well
as its precessor pred ( p ) and start linear
search along the ring.
- Use Finger Tables:
- Each node p maintains a finger table FTp[] with at most m entries
(use mod 2m arithmetic ):
FTp[i] = succ ( p + 2i-1 ) 1 ≤ i ≤ m
Note: FTp[i] points to the first node succeeding p by at least 2i-1.
- To look up a key k, node p forwards the request to node with index
j satisfying
q = FTp[j]; FTp[j] ≤ k < FTp[j +1]
(Stops when k ≤ q, which is the actual node.)
- If p < k < FTp[1], the request is also forwarded to FTp[1]
e.g. Consider resolving k = 26 from node 1.
- Improvements (with modifications):
- topology-based assignment of node identifiers
- proximity routing
- proximity neighbour selection
- Hierarchical Location Service (HLS)
- Build a large-scale search tree for which the underlying network is
divided into hierarchical domains. Each domain is represented by a
separate directory node.
The root knows every entity location (only up to next level)!
- HLS Tree organization
Address of entity E is stored in a leaf or intermediate node
Intermediate nodes contain a pointer to a child iff the subtree rooted at
the child stores an address of the entity
The root knows about all entities
- HLS Lookup operation
Basic principles
Start lookup at local leaf node
Node knows about E => follow downward pointer, else go up
Upward lookup always stops at root
- DNS vs. Chord
DNS | Chord |
provides a host name to IP address mapping |
can provide same service: Name = key, value = IP |
relies on a set of special root servers |
requires no special servers |
names reflect administrative boundaries |
has no naming structure |
is specialized to finding named hosts or services |
can also be used to find data objects that are not tied to certain machines |