Market Data Architecture
Market Data Architecture
Overview
Scope
The scope of the Market Data Network Architecture (MDNA) includes the sources for market data
streams (stock exchanges), the Financial Service Providers (FSP) and the final consumers of the data
(Brokerage Houses).
The network design and strategy for the Brokerage houses is consistent with the Trading Floor
Architecture which is described in the Trading Floor Architecture document.
Corporate Headquarters:
Cisco Systems, Inc., 170 West Tasman Drive, San Jose, CA 95134-1706 USA
Figure 1
Content
Provider
Content
Provider
Content
Provider
Brokerage
Brokerage
Brokerage
225299
Financial
Services
Provider
Brokerage Houses
These are the ultimate consumers of the market data and the people that place the orders.
Software/Services/System Integrators
Integrators are companies that are of part of the Financial Services Ecosystem (FSE) that creates
products and services that tie everything together.
OL-18257-01
The delivery of these data streams is typically over a reliable multicast transport protocol. Traditionally
this has been TIBCO Rendezvous. TIBCO Rendezvous operates in a publish and subscribe environment.
Each financial instrument is given a subject name such as CSCO.last. Each application server can request
the individual instruments of interest by subject name and receive just that subset of the information.
This is called subject-based forwarding or filtering. Subject-based filtering is patented by TIBCO.
A distinction should be made between the first and second phases of the market data delivery. The
delivery of the market data from the exchange to the brokerage is usually considered a unidirectional
one-to-many application. In practice, most exchanges transmit the market data stream from several
servers simultaneously which makes the service more of a few-to-many implementation.
The only exception to the unidirectional nature of the market data might be the retransmission requests
that are typically sent using a unicast. However, the trading applications in the brokerage are definitely
many-to-many applications and might interact with the exchanges for placing orders.
Order Execution
After the market data is received by the brokerage firms, it is used as a basis for the order execution.
Below is a summary of the steps for order execution.
Order execution summary:
1.
Market data is received in ticker plantData stream is normalized; formatted, processed, and
republished.
2.
3.
4.
Order is sent to Order Management System (OMS)Order is logged and passed to the Smart
Routing Engine (SRE) which chooses the execution venue based on price, liquidity, latency, volume,
transaction cost, and so on.
5.
Order is sent to the Financial Information Exchange (FIX) engine which sends the trade to the
Exchange.
6.
7.
Market Making Engine will match sellers to buyers based on published bids and asking price.
Seller is matched to buyer and order gets executed.
8.
OL-18257-01
Figure 2
Stock Exchange
Data Center
Service Distribution
Network
Financial Service
Providers
Brokerage House
Brokerage
Data Center
Traders
Location A
A Feed
225300
Location B
B Feed
The Exchange Data CenterContains the servers which set the prices for the financial instruments
and process the orders.
The Service Distribution networkTransmits the feeds out to the service edgewhich feeds the
brokerages that have Direct Market Access (DMA) and the FSPs. The FSPs feed their brokerage
customers and may normalize the data and add their own analytics to the data stream.
Many exchanges out-source the service distribution network to a provider so that they can focus on their
core business.
Provider Distribution NetworkThe physical network design that allows the FSP to have a regional
and global service presence between the exchanges and their customers. The network design is very
similar to a traditional service provider, but may contain more redundancy than is typically
deployed.
Service EdgeIncludes all the infrastructure to deliver the market data streams to the consumers.
This typically will include all the access technologies in their POPs and will need to support the
necessary security and service policies as defined through the provisioning process.
Business-to-Business Services
The FSPs also offer business-to-business services for their customers. This allows customers to have
direct peer-to-peer networks for specific applications and form closed user groups. Typically, this is done
between the large brokerage houses which engage in direct high volume transactions.
Brokerage Back-Office networkTraditionally in investment firms, the back office contains the
administrative functions that support the trading of securities, including record keeping, trade
confirmation, trade settlement, and regulatory compliance. In terms of market data distribution this
will include the feed handlers, the OMS, the FIX Engine, the algorithmic trading engines, and all
the network infrastructure to support those functions. The Back office infrastructure is typically
protected from the external network connections by firewalls and strict security features.
Brokerage Front-Office NetworkThe front office contains the end user trading systems,
compliance and performance monitors, and all the network infrastructure to support them.
OL-18257-01
The market data feeds are brought into the data center in the back office where they are normalized and
processed. The feeds then get injected into the messaging bus which delivers the data streams to the front
office trading floor. This is typically done with some type of reliable multicast messaging protocol such
as TIBCO Rendezvous or 29 Wests LBM. A different set of multicast groups and infrastructure is used
to distribute the streams to the front office applications.
Architectural Component
Platforms
Technologies
Protocols
Service edge
Brokerage back-office
network
Brokerage front-office
network
PIM SM
Static RP
Anycast RP
Details of Anycast RP and basic design can be found in the Common Best Practices for Market Data
Delivery section on page 25 section. This whitepaper has details on a Anycast RP deployment: Anycast
RP.
The classic high available design for TIBCO in the brokerage network has been documented here:
Financial Services Design for High Availability.
Bidirectional PIM
PIM BiDir is an optimization of PIM SM for many-to-many applications. It has several key advantages
over a PIM SM deployment:
Source traffic is automatically sent to the rendezvous point (RP) and then down to the interested
receivers. There is no unicast encapsulation, PIM joins from the RP to the first hop router and
then registration stop messages.
No SPT switchover
All PIM BiDir traffic is forwarded on a *,G forwarding entry. The router does not have to
monitor the traffic flow on a *,G and then send joins when the traffic passes a threshold.
No need for an actual RP
The RP does not have an actual protocol function in PIM BiDir. The RP acts as a routing vector
in which all the traffic converges. The RP can be configured as an address that is not assigned
to any particular device. This is called a Phantom RP.
No need for MSDP
OL-18257-01
Multicast Source Discovery Protocol (MSDP) provides source information between RPs in a
PIM SM network. PIM BiDir does not use the active source information for any forwarding
decisions and therefore MSDP is not required.
PIM BiDir is ideally suited for the brokerage network and in the exchange data center. In these
environments, there are many sources sending to relatively groups in a many-to-many traffic pattern.
The key components of the PIM BiDir implementation are:
Bidirectional PIM
Static RP
Phantom RP
Further details of Phantom RP and basic PIM BiDir design are in the Bidirectional PIM Deployment
Guide.
Known limitations and considerations:
PIM BiDir RP LimitationThere is a limitation of four PIM BiDir RPs with the current Policy
Feature Card 3 (PFC3) on the Cisco 6500 and Cisco 7600. This might present a network design
limitation although, in many deployments, four RPs will be sufficient.
The next version of the PFC will allow for eight Bidir RPs.
Default Multicast Distribution Tree (MDT) LimitationIn the case in which an exchange or
brokerage is using MVPN for segmentation of their market data feeds in their core there will be a
limitation on the forwarding optimizations available with data MDTs.
When data is forwarded with a *,G entry in either PIM SM or PIM BiDir, the traffic will be
forwarded on the default MDT regardless of the data rate. In other words, high-bandwidth sources
will not trigger a data MDT when used with PIM BiDir. This will cause all the data being sent with
PIM BiDir to be delivered to all the provider edge (PE) routers that are provisioned for that VPN
routing and forwarding (VRF).
This might not be an issuedepending on the application behavior and the network design. If all
the data feeds must be delivered to all the PE routers, there is no loss of optimization. However, if
you need to limit some of the high bandwidth PIM BiDir groups from reaching all the PE routers,
you cannot do so with the Multicast VPN (MVPN) core.
General Risk FactorPIM BiDir has been implemented since December 2000 on software-based
Cisco routers and since April 2003 on the Cisco 6500 with hardware support with the release of the
Sup720. However, there are still a limited number of actual deployments in financial services
networks. Most financial customers are fairly risk adverse and slow to adopt new technologies. PIM
BiDir has fallen into this category for a number of years, but testing and certification have moved
forward in a number of large exchanges and brokerage houses.
Source-Specific Multicast
PIM SSM is an optimization of PIM SM for one-to-many applications. In certain environments, PIM
SSM can offer several distinct advantages over PIM SM. Like PIM BiDir, PIM SSM does not rely on
any data triggered events. Furthermore, PIM SSM does not require a RPthere is no such concept in
PIM SSM. The forwarding information in the network is completely controlled by the interest of the
receivers and the route to the source.
PIM SSM is ideally suited for market data delivery in the FSP. The FSP can receive the feeds from the
exchanges and then route them to the edge of their network.
Provisioning Options
Many FSPs are also implementing Multiprotocol Label Switching (MPLS) and MVPNs in their core.
PIM SSM is the preferred method for transporting traffic with MVPN.
When PIM SSM is deployed all the way to end users, the receiver indicates interest in a particular S,G
with Internet Group Management Protocol Version 3 (IGMPv3). Even though IGMPv3 was defined by
RFC 2236 back in October 2002, it still has not been implemented by all edge devices. This creates a
challenge for deploying an end-to-end PIM SSM service. A transitional solution has been developed by
Cisco to enable an edge device that supports Internet Group Management Protocol Version 2 (IGMPv2)
to participate in an PIM SSM service. This feature is called SSM Mapping and is documented at the
following URL:
http://www.cisco.com/en/US/products/sw/iosswrel/ps5207/products_feature_guide09186a00801a6d6f.
html
While SSM Mapping allows an end user running IGMPv2 to join a PIM-SSM service this would require
IGMP to operate at the service edge. This problem could be solved with IGMP Mroute Proxy, described
in the IGMP Mroute Proxy section on page 13. A better solution would be a service called PIM
Mapping. This service would allow a PIM *,G join to be translated into a PIM S,G join at the service
edge. This is a potential new feature, that is currently being investigated, which can be implemented to
create an easy method to interface between providers and their customers.
Table 2 summarizes the PIM protocol.
Table 2
PIM SM
PIM BiDir
PIM SSM
Applications
One-to many
Many-to-one
Many-to-many
One-to-many
Intermittent Sources
Potential issue
No problem
No Problem
Network Deployment
Anywhere
Exchange DC or
brokerage
FSP or MVPN
Number of Deployments
Most
Least
Many
Provisioning Options
FSPs and the exchanges need a method to provision services for customers. The trade-offs are
administrative overhead, security and simplicity. This section describes the following options:
Static Forwarding
Static forwarding has traditionally been the first choice for provisioning market data services. The
description of static forwarding provided in this publication includes the following sections:
10
OL-18257-01
Provisioning Options
In order to simplify the configuration for dozens or hundreds of groups the static group range
command has been added. The following is an example:
class-map type multicast-flows market-data
group 224.0.2.64 to 224.0.2.80
interface Vlan6
ip igmp static-group class-map market-data
Minimal CoordinationThe static forwarding requires very little coordination between content
provider and customer. The provider is responsible for putting the packets on a wire and the
customer must capture them. None of these relationships need to be negotiated:
PIM neighbors
Designated router (DR)
RP information
MSDP peering
Clear DemarcationThere is a clear separation between the customer network and the provider
network from an ownership perspective. They are essentially separate multicast domains with each
responsible for their part. This separation reduces finger pointing and simplifies troubleshooting.
Another advantage is that both customer and provider are free to choose what flavor of multicast
they implement in their own domains e.g. PIM SM, PIM BiDir, PIM SSM
As for disadvantages, the main drawback for static forwarding is that the customer is unable to
dynamically control subscriptions and bandwidth usage for the last mile. As the data rates for market
data from the exchanges continue to climb month-by-month this becomes more of an issue.
11
Provisioning Options
The main trick is to point the customer routers to an RP with an address that is out the upstream interface.
IGMP from the downstream receivers will trigger PIM (*,G) messages from the last hop router (LHR)
toward the virtual RP. This will create (*,G) state on the customer edge router with an incoming interface
pointing to the provider and an outgoing interface toward the receivers. The ingress traffic will be
forwarded out the egress interface on the (*,G) forwarding tree.
This approach is used by many customers and applies equally well to PIM BiDir or PIM SM.
A benefit of using the virtual RP is that the RP address can be carved out of the customers address range.
This does not require injecting the providers address range into the customer network.
In Figure 3, the virtual RP address is injected into the customer network with a static route that is
redistributed into the interior gateway protocol (IGP).
MD DistributionVirtual RP
Source
10.2.2.2
interface Ethernet0
ip address 10.1.2.1 255.255.255.0
ip pim sparse-mode
ip igmp static-group 224.0.2.64
Market Data
Source Network
Virtual RP
interface Ethernet1
ip address 10.1.2.2 255.255.255.0
ip pim sparse-mode
ip pim rp-address 10.1.1.1
ip route 10.1.1.1 255.255.255.255 10.1.2.5
router ospf 11
network 10.1.0.0 0.0.255.255 area 0
redistribute static
Destination
224.0.2.64
e0
e1
e0
Customer
225301
Figure 3
Another advantage of the virtual RP is that because traffic will be forwarded on the *,G entry, the packets
will pass an RPF check according to the virtual RP addressnot the source address of the packet. Again,
this saves the customer from having to redistribute the source address range in their network. The
customer can add the command ip pim spt-threshold infinity on all of the LHRs if desired, in order to
prevent the LHRs from sending PIM (S,G) joins and creating (S,G) state. The Reverse Path Forwarding
(RPF) check and the PIM (S,G) joins will not be issues if the customer uses PIM BiDir.
12
OL-18257-01
Provisioning Options
Dynamic Forwarding
The data rates for market data have been steadily increasing at an alarming rate. OPRA
(www.opradata.com) data rates are the most challenging in the industrytoday they peak at over
500,000 messages per second and that number is expected to increase steadily for the forseeable future.
In the past, customers would receive the entire data stream and just process the subset that they need.
The increased data rates have driven up the transmission costs to the point that it is now economically
desirable to limit the amount of traffic by dynamically requesting a smaller portion of the available data
streams.
Dynamic subscriptions give the subscriber the ability to request individual multicast streams. This gives
the customer the ability to manage the bandwidth usage on the last mile.
Another driver for dynamic subscription requests is the move to 24-hour trading. Customers need to
archive all the market prices throughout the day to analyze trends for long term forecasting. Many
exchanges retransmit the entire trading day after the market closes so that customers can capture any
missed packets. As the exchanges more closer to 24-hour operation, they will not be able to retransmit
the entire trading day. Customers will need to dynamically request a portion of the data stream.
The description of dynamic forwarding provided in this publication includes the following sections:
13
Provisioning Options
must act like a host to the provider and participate in the DNS/DHCP exchange and then provide that
information down to the hosts. With IP multicast, the process is the similar, but in reverse. The router
must proxy the IGMP messages from the hosts up to the service interface.
Many customers like this approach because it provides a clean interface to divide the domains. The
customer network just looks like a host to the providernot an internetwork.
A combination of igmp proxy commands can make this conversion possible. Figure 4 shows a typical
config example.
Figure 4
Market Data
Source Network
interface Loopback1
ip address 10.3.3.3 255.255.255.0
ip pim sparse-mode
ip igmp helper-address 10.4.4.4
ip igmp proxy-service
ip igmp access-group filter-igmp-helper
ip igmp query-interval 9
10.4.4.0/24
e0
e1
IGMP
loopback1
e0
interface Ethernet0
ip address 10.2.2.2 255.255.255.0
ip pim sparse-mode
ip igmp mroute-proxy Loopback1
IGMP
225302
PIM
Customer
All the interesting configuration is placed on the customer edge router. The rest of the routers in the
customer network are configured with standard PIM SM and point to a RP in the provider network. This
RP is not requireonly a route to the address pointing out the upstream interface is needed. An example
using a virtual RP with static forwarding is discussed in the Virtual RP with Static Forwarding section
on page 11.
The steps to make IGMP mroute proxy process work are as follows:
1.
The host application triggers an IGMP membership report which is sent to the LHR.
2.
(*,G) state is created on the LHR and triggers a PIM (*,G) join message to be sent toward the RP.
3.
The PIM join message filters up through the network toward the provider service edge.
4.
The PIM (*,G) join is received on downstream interface of the customer edge router. This causes
(*,G) multicast state to be created on the edge router.
5.
The creation of the *,G state and the presence of the mroute-proxy command on the downstream
interface triggers an unsolicited IGMP membership report to be generated on the loopback interface.
6.
The IGMP helper address on the loopback interface redirects the IGMP membership report to the
upstream interface toward the market data service.
14
OL-18257-01
Provisioning Options
7.
The market data feed is pulled down into the customer network.
8.
As long as the mroute state is active on the customer edge router, IGMP membership reports will
continue to be sent to the provider network.
Enhancements in IOS are underway to make this type of service compatible with the IGMP Proxy
described in RFC 4605.
PIM Neighbor RelationshipsThe provider edge must recognize the customer routers as neighbors
so that they will accept the PIM join messages. Potential additional security options are PIM
neighbor filters and IP Security (IPSec) authentication for the neighbors. Both of these methods
have a trade off with security versus additional maintenance.
RP InfoThe provider will need to share their RP address with the customer. This would be an
Anycast or PriorityCast RP address following the multicast best practices. Some providers are
dynamically sharing this info with customers using AutoRP.
MSDPThis is the standard interdomain PIM SM solution and can be used for market data delivery.
It requires a peering relationship between the provider RP and the customer RP. Depending on the
number of customers served by the provider there may be scaling considerations. A multi-layer
peering hierarchy would most likely be required.
Redundancy IssuesUsing PIM SM and PIM BiDir will leave the source redundancy up to the
server side of the application. The standby server can monitor the stream from the primary server by
subscribing to the multicast group. If the stream stops for longer than a defined time the secondary
server can start transmitting. The traffic will be forwarded to all the receivers on the *,G (shared
tree) without any additional control plane activity. Alternatively, the standby server can send
periodic keepalives to maintain the S,G state when it is in standby mode so that an S,G mroute will
already be established. PIM BiDir does not have this issue; every packet will be delivered to the
receivers without any control plane involvement. There is more on this issue in the Intermittent
Sources section on page 28.
Service is unidirectionalThe multicast service is unidirectional in nature. This is typically the case
since the retransmission requests are sent using unicast.
Global source addressesThe provider uses a global source address that can be delivered
end-to-end. If the provider cannot use global source addresses, then there might be an address
collision and some type of network address translation (NAT) will need to be used which might
reduce performance and latency.
Support for IGMPv3The application, the client OS, and the network must support IGMPv3. There
are a number of transitional features such as SSM Mapping, but these might be problematic to
implement as part of the service.
15
Provisioning Options
Redundancy Issues
PIM SSM places the source redundancy requirements up to the network or the receiver side of the
application.
16
OL-18257-01
17
Figure 5
Data Center A
Data Center B
Data Center
Interconnect
RP-A2
POP 1
RP-B1
POP 2
POP 3
RP-B2
POP 4
225303
RP-A1
The market data feeds can be delivered either through a native IP unicast/multicast design or an MPLS
configuration. The MPLS configuration is typically used to achieve service separation with MPLS-VPN
and MVPNs.
The larger exchanges offer many different services. They are essentially several different exchanges
co-located together. The exchanges need a way to provision any of these services for their
customerswhile keeping customers separated and distinct. Service separation is a key method to
handle this issue.
18
OL-18257-01
We represent the RP address of the A feeds as RP-A-addr and the B feeds as RP-B-addr.
Each pair of RPs are configured as Phantom RPs. This type of configuration is described in the
Bidirectional PIM Deployment Guide. There is a link to the deployment guide and further discussion
about PIM BiDir in the IP Multicast Protocol Options section on page 7.
Each PoP would have routes that point back to the RPs in Data Center A or Data Center B using distinct
paths. The RPs would be defined as follows:
ip pim rp-address RP-A-addr 1 override bidir
ip pim rp-address RP-B-addr 2 override bidir
access-list 1 permit 233.255.255.0 0.0.0.127
access-list 2 permit 233.255.255.128 0.0.0.127
This design enables the exchange to deliver multicast market data services without any of the potential
control plane issues associated with PIM SM. However, there might be brokerages that are unable to
initially support PIM BiDir end-to-end and that require a transitional strategy.
The bidir-neighbor-filter command can be used on the customer facing interfaces of the exchange to
ensure that designated forwarder (DF) election will occur even if the downstream router does not support
PIM BiDir. It also offers some control plane security since it will demand that the exchange router will
always be the DF.
interface GigabitEthernet3/15
ip pim bidir-neighbor-filter 9
access-list 9 deny any
This causes the source address for the default and data MDTs of VRF RED to be the IP address on the
loopback 2 interface.
19
Now that the source addresses for all the MDTs associated with this VRF are different from other VRFs
the PEs at the bottom of the network can be configured to prefer one data path over another through
normal unicast routing methods such as floating static routes. This can be used to direct one set of
streams to flow down the left side of the network illustrated in Figure 5 and the rest down the right side
of the network.
Provisioning feeds from the exchange can be done in one of two ways.
Static ProvisioningThis option is for FSPs that require the feeds all the time. These FSPs have no
hosts locally to subscribe to the feeds and the FSPs must distribute the data streams to their
customers without having to rely on end-to-end dynamic joins.
Dynamic ProvisioningFor brokerages that want to manage their own bandwidth consumption. The
recommended method for dynamic provisioning is PIM joins which is described in detail in the
Dynamic Forwarding section on page 13.
This design ensures that the exchange will dependably deliver the multicast market data feeds with the
typical behaviors seen today in the data streamssuch as intermittent sources.
The customer facing interfaces should be configured with the appropriate safeguards as discussed in the
Edge Security section of the IP Multicast Best Practices for Enterprise Customers document.
Converged Network
In the converged core design (see Figure 6), traffic shares a single physical topology. Path separation is
accomplished by some type of traffic engineering. In terms of native multicast traffic this generally
means route manipulation.
20
OL-18257-01
A - Feed
B - Feed
A - Feed
B - Feed
Financial
Services
Provider
Brokerage
A - Feed
B - Feed
MPLS-VPN
MVPN
Brokerage
Brokerage
225304
Figure 6
In the future, the traffic engineering for multicast might be done with Label Switched Multicast (LSM)
and point-to-multipoint traffic engineering (P2MP TE).
Separate Cores
In the separate core design (see Figure 7) there will be no requirement for traffic engineering. The path
separation will be accomplished by physical network separation.
21
A - Feed
B - Feed
A - Feed
Financial
Services
Provider
B - Feed
A - Feed
B - Feed
MPLS-VPN
MVPN
Brokerage
Brokerage
Brokerage
225305
Figure 7
The provisioning method in the FSP can be either static or dynamic as is the case for the exchange
environment. The dynamic subscription model is either IGMP or PIM joinsboth are described in detail
in the Dynamic Forwarding section on page 13.
The customer facing interfaces should be configured with the appropriate safeguards as discussed in the
Edge Security section of the IP Multicast Best Practices for Enterprise Customers document.
Back-Office Network
The large brokerages houses are typically spread across multiple sites for many reasons, including real
estate space, power limitations, business continuance design, and so on.
The design presented in Figure 8 leverages the use of multiple sites to create greater efficiencies in
space, power, and business continuance. Each remote site has a feed pod. These feed pods are
independent and receive a different subset of the total market data feeds which allow for source diversity.
22
OL-18257-01
The feed pods are connected with Dense Wavelength Division Multiplexing (DWDM) or 10-Gigabit
Ethernet so that each pod has access to all the feeds with the minimum latency penalty.
Brokerage Market Data Distribution
Feed Pod
A - Feed
B - Feed
Feed Pod
A - Feed
Feed Pod
B - Feed
A - Feed
A - Feed
Feed Handlers
(Wombat, Tibco, 29 West)
Feed Handlers
(Wombat, Tibco, 29 West)
Feed Handlers
(Wombat, Tibco, 29 West)
Execution Engines
Execution Engines
Execution Engines
Market Data
Data Center Pod
Market Data
Data Center Pod
Market Data
Data Center Pod
Site 1
Site 2
Site 3
225306
Figure 8
The recommended method of multicast delivery for these feeds is PIM BiDir. The routers in each pod
are the RPs for those feeds. This is a very straight forward and elegant configuration since a different
multicast address range is received in each pod.
The feeds are then pulled down into the market data DC pods, which process data and execute the trades.
Each of the market data DC pods has access to all the feeds in the firm. This design allows the firm to
position people and servers at any of the three locationswhichever makes sense for space and power
consumption purposes. There is no penalty for having the servers in one location or another.
There are also connections between sites on the front office side of the market data DC pods. This allows
specialized analytics to be shared by trading applications throughout the firm.
The failure of any single link causes a minimal interruption of the data flow.
23
Front-Office Network
In each site, there is a front office network to distribute the analytics and market data to the various
trading applications and human traders. See Figure 9.
The data distribution is typically handled by a separate infrastructure than in the back office network.
In terms of multicast, this means a different set of RPs and usually a different set of multicast addresses.
The servers in the market data DC pods read in the raw feeds and then republish with a different set of
sources and groups.
Figure 9
Content Provider
or
Financial Service Provider
RP
RP
225307
Data Center
If PIM BiDir is being used, then path diversity is achieved with two sets of RPseach responsible for
half the multicast groups. This is the same technique used in the exchange data center described
previously.
If PIM SM is being used for multicast forwarding, then path diversity and reliability is achieved with a
combination of alternating designated router (DR) priority and dedicated networks. An example of this
approach is explained in the Alternating DR Priority section of the IP Multicast Best Practices for
Enterprise Customers document.
24
OL-18257-01
This document provides the best methods for optimizing multicast delivery by focusing on the following
design goals:
Resiliency
Path diversity
Redundancy
Load sharing or splitting
Latency
Security
Topics covered in the IP Multicast Best Practices for Enterprise Customers include:
IGP tuning
IGMP Snooping
MSDP timers
Alternating DR Priority
Edge Security
Propagation delay
25
Queuing delay
Middleware characteristics
Application architecture
Server/OS architecture
The approach to minimize latency must be a holistic effort that reviews the entire market data system
from end-to-end and focuses on reducing latency throughout the design.
Note
The areas with the most room for latency improvement and that can have the greatest impact are the
application and middleware components.
Application Issues
This section addresses the following application considerations:
26
OL-18257-01
Perhaps the biggest limitation is the IGMP stack on the host. The host needs to respond to queries for
each group at least once per minute. When you reach thousands of groups, this is a limitationespecially
when the host receives a general query and needs to respond to each group to which it has subscribed. If
there are many hosts connected to a single switch, processing the thousands of reports from all the hosts
will be a limitation.
The application developers need to find a reasonable compromise between the number of groups and
breaking up their products into logical buckets.
Consider the NASDAQ Quotation Dissemination Service (NQDS) for example. The instruments are
broken up alphabetically as follows:
Another example is the NASDAQ TotalView service, which breaks down as illustrated in Table 3.
Table 3
Data Channel
Primary Groups
Backup Groups
224.0.17.32
224.0.17.35
224.0.17.48
224.0.17.49
224.0.17.50
224.0.17.51
224.0.17.52
224.0.17.53
224.0.17.54
224.0.17.55
224.0.17.56
224.0.17.57
224.0.17.58
224.0.17.59
224.0.17.60
224.0.17.61
This approach does allow for straight-forward network/application management, but does not
necessarily allow for an optimized bandwidth utilization for most users. A user of NQDS that is
interested in technology stocks, and that would like to subscribe only to CSCO and INTL, would need
to pull down all the data for the first two groups of NQDS. Understanding the way the users will be
pulling down the data and then organizing that into the appropriate logical groups will optimize the
bandwidth for each user.
In many market data applications, optimizing the data organization would be of limited value. Typically
customers will bring in all data into a few machines and filter the instruments. Using more groups is just
more overhead for the stack and will not help the customers conserve bandwidth.
Another approach might be to keep the groups down to a minimum level and use UDP port numbers to
further differentiate if necessary. The multicast streams are forwarded based on destination address, but
the UDP ports can be used to aid in filtering the traffic.
The other extreme would be to use just one multicast group for the entire application and then have the
end user filter the data. One multicast group may be sufficient for cases in which all hosts would be
receiving the majority of the financial instruments.
27
Intermittent Sources
A common issue with market data applications is when servers send data to a multicast group and then
go silent for more than 3.5 minutes. These intermittent sources might cause thrashing of state on the
network and can introduce packet loss during the window of time when soft state exists and when
hardware shortcuts are being created.
There are a few scenarios in which the outage can be more severe. One case would be if the source starts
sending again right around the 3.5 minute mark. At that point state has started to time out in some of the
routers along the data path and there might be inconsistent states in the network. This could create a
situation in which data from the source would be dropped for as long as a minute until state clears out
and then is created again on the intermediate routers.
On the Cisco 6500 and Cisco 7600 there are some additional platform-specific issues with intermittent
sources. Multicast flows are forwarded by hardware shortcuts on the Policy Feature Card (PFC) or
Distributed Forwarding Card (DFC). The statistics from these flows are maintained on the PFC/DFC and
are periodically updated to the Multilayer Switch Feature Card (MSFC). By default this update happens
every 90 seconds, but can be lowered to every 10 seconds by lowering the mls ip multicast
flow-stat-timer value to 1. Due to this delay in receiving the latest flow statistics for individual multicast
streams, it is possible that a source could go quiet for three minutes and then start transmitting again; the
mroute state will still be removed for no activity. This could cause an outage of an active stream for
one-to-two minutes, depending on the state of the network.
The following are the best solutions to deal with intermittent sources: PIM BiDir or PIM SSM; null
packets; periodic keepalives or heartbeats; and, S,G expiry timer. Each is described briefly in the short
discussions that follow.
Null Packets
In PIM SM environments, a common method used to ensure that a forwarding state is created is to send
a burst of null packets to the multicast group before the actual data stream. The application needs to
effectively ignore these null data packets so they do not affect performance. The sources only need to
send the burst of packets if they have been silent for more than three minutes. A good practice would be
to send the burst if the source was silent for more than one minute.
Many financial applications send out an initial burst of traffic in the morning and then all well-behaved
sources will not have a problem.
28
OL-18257-01
RTCP Feedback
A common issue with real time voice and video applications that use Real-time Transport Protocol (RTP)
is the use of Real-Time Control Protocol (RTCP) feedback traffic. Unnecessary use of the feedback
option can create excessive multicast state in the network. If the RTCP traffic is not required by the
application it should be avoided.
Receivers can be implemented and configured to send RTCP feedback using unicast. This has the
advantage of allowing the server to still receive the feedback, but not create all the multicast state.
TIBCO Heartbeats
TIBCO Rendezvous has had the ability to use IP multicast for the heartbeat between the TIBCO
Information Caches (TICs) for many years. However, there are some brokerage houses that are still using
very old versions of TIBCO Rendezvous that use UDP broadcast support for the resiliency. This
limitation is often cited as a reason to maintain a Layer-2 infrastructure between TICs located in different
data centers. These older versions of TIBCO Rendezvous should be phased out in favor of the IP
multicast-supported versions.
29
Multicast multipathThis feature is used to load balance multicast traffic between equal-cost
neighbors. Normally, PIM joins are forwarded to the PIM neighbor with the highest IP addressif
there are multiple equal-cost alternatives. When this command is enabled the PIM neighbor will be
selected pseudo-randomly from the available equal-cost neighbors, resulting in load-splitting of
traffic from different sources.
Multicast multipath should be disabled to guarantee that multicast traffic will follow the equal-cost
path with the highest IP address. This feature is not enabled by default.
Cisco Express Forwarding (CEF) per-destination modeUnicast routing can use a number of
different methods to forward traffic. The unicast forwarding method must be verified as being
compatible with the multicast forwarding path.
For example, CEF can be configured with per-destination or per-packet forwarding modes. The
per-destination mode guarantees that all packets for a given destination are forwarded along the
same path. In most cases, the per-destination option is the better choice. The per-destination mode
is the default with CEF.
Port channel hashingPort channeling is used to combine multiple physical channels together into
one logical channel. The physical path that any one traffic stream will take is dependent on a hashing
algorithm.
The options available for the hashing algorithm are different depending on the switch platform and
software version, but a common load-balancing policy for the hash is a combination of the source
and destination IP address of the traffic stream.
Since RMDS traffic for each financial instrument is sent from one source address to destination
address (unicast, broadcast, and multicast addresses) it is possible that different hashes will be
selected for each packet stream.
The number of different paths chosen for a particular source can be minimized by choosing a
hashing algorithm that only uses the source address. Therefore, a hash that takes only the source
address into consideration would work best with RMDS. This can be configured globally in Cisco
IOS with the following command: port-channel load-balance src-ip
30
OL-18257-01
Live-Live or Hot-Hot
The term Live-Live (also referred to as Hot-Hot) refers to the method of sending redundant data streams
through the network using path separation and dedicated infrastructure. For example, an A copy of the
streams would be sent to one set of multicast groups and a B set of streams will be sent using a second
set of multicast groups. Each of these groups will typically be delivered using a parallel, but separate,
set of equipment to the end user with complete physical path separation.
One of the main justifications for Live-Live is the requirement to not lose a single packet in the data
stream. When Live-Live is implemented with full physical path separation and redundant server
infrastructure for the A and B streams, it can provide resiliency for a failure in a servers or in the
network. Live-Live can also be implemented in a converged network (no physical path separation) and
there will still be resiliency for the servers, but not necessarily for a network failure.
Financial services have been doing this for many years. Live-Live is usually preferred over a reliable
multicast solution such as TIBCO Rendezvous, 29West, or Pragmatic General Multicast (PGM). One of
the limitations with a reliable multicast solution is that the retransmissions and overhead introduce
latency and delays. In the finance world today, there is an arms race to reduce latency. Every millisecond
is worth money and financial services organizations want reliability, but not at the expense of latency.
The Live-Live approach will allow for the minimum possible latency without the need for
retransmissions.
A-B In C Out
Many brokerages receive both the A and B streams in their data centers and then feed handlers are used
to arbitrate, normalize, and clean the data stream. The post-processed data stream is then injected into a
messaging bus that feeds the core infrastructure of the trading applications. The message bus typically
uses a reliable multicast transport protocol, such as TIBCO Rendezvous or 29West.
Some brokerages position algorithmic trading engines in parallel with the feed handlers and process the
raw feeds directly. This requires the trading engines to be able to parse the raw streams and clean the
inherent transmission errors. This is a high-maintenance procedure and is usually only performed by the
largest brokerages.
Usually, these new A and B streams are not Live-Live in that the end users do not have to perform
arbitration, but rather they are both forwarded using a reliable messaging bus and the infrastructure
is completely separate. Usually half the end users would receive the A stream through the left side
of the infrastructure and the other half would receive the B stream through the right half of the
infrastructure. This is the same design described in the Alternating DR Priority section of the IP
Multicast Best Practices for Enterprise Customers document.
If this design is implemented properly, the data will continue to operate with a failure in the provider at
the same time as a failure in the brokerage. For example, even with a failure in the provider with the
original A feed and a failure in the brokerage with the new B feed, half the end users should still be able
to receive the new A feed. Alternatively, with the A-B in C out strategy, if the C feed fails everything
stops.
31
Multisite
Some brokerages combine the benefits of Live-Live with the added strategy of using multiple sites. For
example, some brokerages receive the A feed in New York and the B feed in New Jersey. These streams
are then processed with one of the above methods and then republished.
Brokerages usually cannot justify receiving the A-B in New York and then A-B in New Jersey again.
When this can be justified, it leads to some interesting republishing schemes.
There are several ways to handle republishing the stream in this situation. One way would be a
hot-standby method. The New York stream would be transmitted everywhere under normal
circumstances. The servers in New Jersey would listen to the stream and then only start publishing if
then server in New York fails. There are more complicated schemes that have been considered in which
each site receives the local stream all the time. When there is a failure then the servers switch to the
stream from the other site and republish that data.
32
OL-18257-01
Retransmission Models
Retransmission Models
IP multicast uses UDP which is an unreliable protocol at the network layer. In order to guarantee the
delivery of every message, there must be a method higher up in the stack. This section discusses the
following retransmission options:
Live-Live, page 33
Reliable Multicast
Reliable multicast protocols have some type of retransmission scheme to deal with the cases of dropped
packets. PGM and TIBCO Rendezvous (via TIBCO Reliable Data Protocol or TRDP) have similar
retransmission schemes which involve using negative acknowledgements (NAKs) and report
suppression.
PGM and TRDP have the ability to cache the stream and therefore limit the retransmissions to a local
area. PGM uses a Data Local Repairer (DLR) and TIBCO Rendezvous has this ability in the RVRD
function.
Reliable multicast schemes are typically deployed in the brokerage and not in the FSP or the exchange.
Live-Live
Live-Live does not generally need a retransmission scheme. The redundant streams are what guarantee
delivery. However, there are still times when gaps exist in the data stream at the brokerage and
retransmissions are then needed for part of the data stream. In those situations, methods described in the
sections that follow are generally used.
33
Service Segmentation
Replay for the entire day is becoming more of a problem as the markets are moving toward a 24-hour
trading day. For example, CME is already trading 23.5 hours per day for five days a week. Replay of the
entire trading day is not possible. CME uses unicast retransmission requests and sends the replay traffic
to dynamically joined multicast groups.
LatencyReliable multicast will introduce latency for overhead and retransmissions when
compared to Live-Live.
LicensingThe licensing fees for the messaging protocols which use reliable multicast can add up
quickly with a large deployment. This might change in the future with the development of the open
source message protocol AMQP.
Live-LiveIdeal for unidirectional market data in which all the receivers will receive the full feed.
A message bus with reliable multicast is ideal for complex applications with many hosts sharing
information in a many-to-many situation. They fit different application environments, but there is
some overlap.
Content filteringReliable multicast messaging usually has the built in ability for some type of
subject-based filtering which can limit the traffic forwarded to individual branches.
Service Segmentation
Service segmentation is a method by which network connectivity and application reachability can be
divided into separate virtual silos. The network services in terms of control plane and data plane
reachability are completely separate from one silo to another.
The requirements for network segmentation are applied equally to unicast and multicast. Many market
data products have both unicast and multicast components and the segmentation would need to apply to
both.
There are many reasons why financial organizations are implementing some type of service
segmentation The key reasons are as follows:
34
OL-18257-01
Multicast VPN
Service provisioningFSPs need the ability to provision whole groups of services to individual
customers in an incremental fashion. Service segmentation will allow the providers to enable certain
services and limit the customer to those services.
Closed user group extranet servicesMany providers are offering extranet services to their
customers. This arrangement allows a subset of customers to conduct business-to-business
operations in a private environment.
Partner networksMany FSPs resell their services to partners. There are requirements to keep those
networks separate from their other production services.
Multicast VPN
Multicast VPN (MVPN) is an approach for delivering segmented services for multicast feeds that is
gaining traction in the financial community. It has already been implemented by the some exchanges and
several brokerage firms are looking at it.
One method of provisioning segmented services is with 802.1Q and multiple VRFs. See Figure 10. The
interface between the provider and the customer is divided into a number of subinterfaces with each one
being in a different VRF. This is not considered MVPN because it does not require Border Gateway
Protocol (BGP) and the traffic is not encapsulated.
Figure 10
VRF A
PE
MD
MPLS/mVPN
Cloud
PE
Server B
MD
T
Server A
VRF B
PE
CE
225308
Physical
Pipe
The 802.1Q/multi-VRF approach allows the provider to offer multiple services and closed user groups
without the use of an extranet. An extranet implementation would be an alternate option to offer the same
type of services without using 802.1Q.
35
Service Translation
Service Translation
FSPs and brokerages often require transforming market data streams to different multicast address
ranges. This section describes the main benefits and existing solutions to accomplish that goal.
Address CollisionMany CPs and FSPs offer overlapping services in the administratively scoped
multicast range (RFC 2365).
Domain separationMulticast destination NAT is a key tool used to forward data streams between
two distinct multicast domains. Ideally, the customer edge router can appear like a receiver/host in
one domain and a source in the second domain.
RedundancyNATing the multicast address allows the customer to create an A and B stream. The
original stream can be forwarded through the network and a new copy of the stream with a different
group address can also be forwarded.
Existing Solutions
The following existing solutions are available for FSPs and brokerages for transforming market data
streams to different multicast address ranges:
Multicast NATThe feature that is called Multicast NAT today only has the ability to modify the
unicast source addresses and it is not supported on hardware based platforms.
It does, however, translate the unicast address in PIM control packets including joins and registers.
A summary of the functionality for Multicast NAT can be found at the following link:
http://www.cisco.com/en/US/tech/tk648/tk361/technologies_tech_note09186a008009474d.shtml
That link provides content describing how Multicast NAT works on Cisco routers.
Limitations: Multicast NAT does not support translation of MSDP messages.
Multicast helper-mapA destination NAT of multicast traffic is an unsupported side effect that is
not recommended. This functionality might be removed in new Cisco IOS versions.
Multicast service reflectionMulticast service reflection is a feature that was added to the
software-based platforms. It is recommended in cases with moderate performance requirements.
Relevant Cisco-provided docs can be found at the following location:
http://www.cisco.com/en/US/products/ps6441/products_feature_guide09186a008073f291.html
Limitations:
No show commands to see active translations
No management information base (MIB) support for translation information
Only supported on software-based platforms
36
OL-18257-01
Does not modify any multicast control plane traffic (PIM, AutoRP)
Cisco Firewall Service Module (FWSM) on Cisco 6500/Cisco 7600The NAT functionality on the
Cisco FWSM includes the ability to translate the source and destination address of multicast
streams.
It is recommended for applications that require higher performance than multicast service reflection
can provide.
Known limitations:
Cisco FWSM does not NAT any control plane packets. This makes for some interesting use
packets.
No ability to make two copies of one stream.
No ability to convert unicast stream to multicast.
37
38
OL-18257-01