Category Archives: Design

NANOG: load-balancing facebook and interfacing IPV6 using LISP

Donn Lee talked about LISP Deployment at Facebook. No, not that LISP. This one:
In the current Internet routing and addressing architecture, the IP address is used as a single namespace that simultaneously expresses two functions about a device: its identity and how it is attached to the network. One very visible and detrimental result of this single namespace is manifested in the rapid growth of the Internet’s DFZ (default-free zone) as a consequence of multi-homing, traffic engineering (TE), non-aggregatable address allocations, and business events such as mergers and acquisitions.

LISP changes this by separating IP addresses into two new namespaces: Endpoint Idenfitiers (EIDs), which are assigned to end-hosts, and Routing Locators (RLOCs), which are assigned to devices (primarily routers) that make up the global routing system.

So Lee used that to load-balance facebook, which you can try out here:

http://www.lisp4.facebook.com/.

If I understood him, he said his group of network engineers did all this without needing to involve software development, because facebook is still “a small, scrappy company” that permits and encourages such things.

-jsq

What we can learn from the Therac-25

What does Nancy Leveson’s classic analysis of the Therac-25 recommend? (“An Investigation of the Therac-25 Accidents,” by Nancy Leveson, University of Washington and Clark S. Turner, University of California, Irvine, IEEE Computer, Vol. 26, No. 7, July 1993, pp. 18-41.)
“Inadequate Investigation or Followup on Accident Reports. Every company building safety-critical systems should have audit trails and analysis procedures that are applied whenever any hint of a problem is found that might lead to an accident.” p. 47

“Government Oversight and Standards. Once the FDA got involved in the Therac-25, their response was impressive, especially considering how little experience they had with similar problems in computer-controlled medical devices. Since the Therac-25 events, the FDA has moved to improve the reporting system and to augment their procedures and guidelines to include software. The input and pressure from the user group was also important in getting the machine fixed and provides an important lesson to users in other industries.” pp. 48-49

The lesson being that you have to have built-in audit, reporting, transparency, and user visibility for reputation.

Which is exactly what Dennis Quaid is asking for.

Remember, most of those 99,000 deaths a year from medical errors aren’t due to control of complicated therapy equipment: Continue reading

Route Hijacking: Identity Theft of Internet Infrastructure

Peter Svensson gives an old and quite serious problem some mainstream press in this AP story from 8 May 2010:
On April 25, 1997, millions of people in North America lost access to all of the Internet for about an hour. The hijacking was caused by an employee misprogramming a router, a computer that directs data traffic, at a small Internet service provider.

A similar incident happened elsewhere the next year, and the one after that. Routing errors also blocked Internet access in different parts of the world, often for millions of people, in 2001, 2004, 2005, 2006, 2008 and 2009. Last month a Chinese Internet service provider halted access from around the world to a vast number of sites, including Dell.com and CNN.com, for about 20 minutes.

In 2008, Pakistan Telecom tried to comply with a government order to prevent access to YouTube from the country and intentionally “black-holed” requests for YouTube videos from Pakistani Internet users. But it also accidentally told the international carrier upstream from it that “I’m the best route to YouTube, so send all YouTube traffic to me.” The upstream carrier accepted the routing message, and passed it along to other carriers across the world, which started sending all requests for YouTube videos to Pakistan Telecom. Soon, even Internet users in the U.S. were deprived of videos of singing cats and skateboarding dogs for a few hours.

In 2004, the flaw was put to malicious use when someone got a computer in Malaysia to tell Internet service providers that it was part of Yahoo Inc. A flood of spam was sent out, appearing to come from Yahoo.

The Pakistani incident is illustrated in the accompanying story and video by RIPE.

This problem has been known for a long time. Why hasn’t it been fixed? Continue reading

Design in Security; Don’t Wait to Defend

56+Northern+States+Barn+doors.JPG Gunnar recommends building in security instead of waiting to catch the horses after they’re out of the barn:
The way out of this is for security to get involved in building better systems, getting involved in the system development, Identity management, and coding. Come to the table with useful tools such as Threat Models and Misuse Cases, and make sure you are there early enough to have an impact. Three places to focus are application development, databases, and identity. Time for security to live in code and config not in Visio drawings.
As Gandhi supposedly said about western civilization: “That would be a good idea!”