NSA loves The Bahamas so much it records all its cellphone calls | Ars Technica

Documents obtained by former National Security Agency contractor Edward Snowden show that the NSA has covertly intercepted and recorded nearly all of the calls made to, from, or between cell phones in The Bahamas. The surveillance, reported by The Intercept, used legal monitoring access obtained by the Drug Enforcement Administration.

via NSA loves The Bahamas so much it records all its cellphone calls | Ars Technica.

Photos of an NSA “upgrade” factory show Cisco router getting implant | Ars Technica

A document included in the trove of National Security Agency files released with Glenn Greenwald’s book No Place to Hide details how the agency’s Tailored Access Operations (TAO) unit and other NSA employees intercept servers, routers, and other network gear being shipped to organizations targeted for surveillance and install covert implant firmware onto them before they’re delivered.

These Trojan horse systems were described by an NSA manager as being “some of the most productive operations in TAO because they pre-position access points into hard target networks around the world.”

The document, a June 2010 internal newsletter article by the chief of the NSA’s Access and Target Development department (S3261) includes photos (above) of NSA employees opening the shipping box for a Cisco router and installing beacon firmware with a “load station” designed specifically for the task.

via Photos of an NSA “upgrade” factory show Cisco router getting implant | Ars Technica.

Posted in NSA

US sends its giant spy drone to look for kidnapped Nigerian girls | Ars Technica

The drone that the United States Air Force sees as the replacement for the venerable U-2 spy plane is now flying surveillance missions over Nigeria as part of the search for 276 schoolgirls kidnapped by the Boko Haram terrorist group. A Northrop Grumman RQ-4 Global Hawk flew a mission over Nigeria on Tuesday, according to an NBC News report.

The Global Hawk, which first flew in 1998, can stay airborne for up to 28 hours and has a range of 8,700 miles. It has a wingspan close to that of a Boeing 747, weighs more than 32,000 pounds, and carries the Hughes Integrated Surveillance and Reconnaissance (HISAR) sensor system, a down-market version of the infrared, optical, and synthetic aperture radar gear Hughes developed for the U-2.

via US sends its giant spy drone to look for kidnapped Nigerian girls | Ars Technica.

Iran claims to clone US stealth drone, but it looks fake | Ars Technica

The Iranian military claims to have successfully duplicated the RQ-170 Sentinel drone that was captured in Iran in 2011, and it has put the drone on display alongside the original. The home-built version, Islamic Revolutionary Guard officers claim, could be used to attack US Navy ships in the Persian Gulf. But outside observers believe the copy is about as capable of that as the mock-up of a US aircraft carrier Iran built, allegedly for a movie set.

On May 11, Iranian television broadcast a report from an exhibition by the Islamic Revolutionary Guard Corps Aerospace Force in Tehran, where Ayatollah Ali Khamenei was shown the two unmanned aircraft by military officers. “Our engineers succeeded in breaking the drone’s secrets and copying them,” an officer said in the video broadcast. “It will soon take a test flight.”

The RQ-170, built by Lockheed Martin, is a turbofan-powered unmanned aircraft flown by the 30th Reconnaissance Squadron, part of the Air Force’s 432nd Wing (the Air Force’s drone command). The aircraft first gained notoriety as the secretive “beast of Kandahar” during operations in Afghanistan in 2007. The Air Force is believed to have purchased 20 Sentinels.

Little is known about their operational role, though their “flying-wing” airframe appears to have been designed for stealthy reconnaissance and surveillance missions. It’s believed that the aircraft captured in 2011 by the Iranians was being used to conduct surveillance of nuclear facilities.

The Iranians claimed that they were able to jam the Air Force’s data link to the drone and take control of it, bringing it down for an almost soft landing. They also claimed that the drone was recovered nearly intact and that the Revolutionary Guard was able to download data from its onboard systems. While the US government disputed those claims, later reports indicated that it was within the realm of possibility that the Iranians had managed to take over control of the drone.

Just what sort of “secrets” the RQ-170 surrendered to the Iranians is not clear. But aviation industry analysts who saw the footage of the Iranian clone of the RQ-170 have said it appears to be a fake—nothing more than a cheap fiberglass mockup put together for propaganda purposes, similar to the mockup of a stealth fighter the Iranians displayed last year. (Footage of that plane “flying” appeared to actually be of a small radio-controlled model.)

“It seems their fiberglass work has improved a lot,” an industry source familiar with the RQ-170 told US Naval Institute News. “It also seems that if it were a functional copy, versus a detailed replica, it wouldn’t necessarily have the exact same landing gear, tires, etc. They would probably just use whatever extra F-5 parts or general aviation parts they had lying around.”

via Iran claims to clone US stealth drone, but it looks fake | Ars Technica.

NSA routinely tapped in-flight Internet, intercepted exported routers | Ars Technica

In his new book No Place to Hide, Glenn Greenwald revealed a number of additional details on the “craft” and tools used by the NSA and its British counterpart, the GCHQ. While many of the capabilities and activities Greenwald details in the book were previously published in reports drawing from Edward Snowden’s vast haul of NSA documents, a number of new pieces of information have come to light—including the NSA’s and GCHQ’s efforts to use airlines’ in-flight data service to track and surveil targeted passengers in real time.

The systems—codenamed “Homing Pigeon” by the NSA and “Thieving Magpie” by the GCHQ—allowed the agencies to track which aircraft individuals under surveillance boarded based on their phone data.

via NSA routinely tapped in-flight Internet, intercepted exported routers | Ars Technica.

After 17-year march, Army still drags its boots on buying high-tech radios | Ars Technica

The US Army and other military services began development of software-defined radios to replace aging analog systems in 1997—long before Wi-Fi, broadband cellular, and high-definition television were even on the drawing board. The Joint Tactical Radio System (JTRS) program was supposed to revolutionize battlefield communications, turning soldiers and vehicles into nodes in an all-digital network that allowed data and video to flow as easily as voice traffic.

Little did the people working on the JTRS program know that the product of their labors would take 20 years to start being deployed in volume to troops—and how little of the original scope of the program would ever make it into service. The Army just announced this month its roadmap for rolling out JTRS-based Handheld, Man-Pack, and Small Form Factor (HMS) program radio systems in volume—three years from now. That means it may be 2018 before most soldiers see the radios in the field.

On May 2, at Fort Bliss, Texas, the Army’s HMS program team conducted its first “terrain walk-around” test of the AN/PRC-155 Manpack Radio, General Dynamics’ backpack offering for the program. The tests were in advance of a Network Integration Evaluation test at White Sands—the same evaluation exercise where, in 2011, the Ground Mobile Radio program met its Waterloo. The Army cancelled the GMR program after those tests and after an investment of $6 billion.

via After 17-year march, Army still drags its boots on buying high-tech radios | Ars Technica.

Massachusetts “Romneycare” site killed after rejecting Obamacare transplant | Ars Technica


The Massachusetts Health Connector is getting its plug pulled.

Nevada, Maryland, Massachusetts, Minnesota, and Oregon are members of a club that no one wants to join—all of these states have largely failed at getting their electronic health insurance exchange sites to work properly (or, in some cases, at all). Given the legislatively mandated deadline, the delays in delivery of requirements by the federal government, and the scale of the task that faced states developing their own healthcare exchange sites under the Affordable Care Act, people familiar with government information technology projects might tell you that it’s surprising that any of the websites worked at all.

But if any state had a greater shot at success, it was Massachusetts—the state that served as the model upon which the Affordable Care Act was based. Now, Massachusetts’ health exchange has decided to shutter its own site at least temporarily, switching to the federal exchange to buy time for a better fix.

States running their own exchanges need to be ready by November 15 for the next round of open enrollment for health plans. That has put a number of states with floundering exchange sites in a pinch. Oregon was the first state with its own exchange to completely abandon its own website after spending more than $300 million in federal grants on the project.

Oregon officials have publicly blamed the database giant Oracle, the state’s primary contractor for the site, for its failure. In March, the Government Accountability Office announced that it would conduct an investigation of the Cover Oregon exchange project; last week, The Wall Street Journal reported that the FBI is now conducting its own investigation.

In an official statement in April, an Oracle spokesperson said that “Oracle looks forward to providing any assistance the state needs in moving parts of Oregon’s health care exchange to the Federal system if it ultimately decides to do so.” Last week, the board of the exchange voted to move to the federal exchange.

via Massachusetts “Romneycare” site killed after rejecting Obamacare transplant | Ars Technica.

FAA fines ’80s band bassist for violating NYC airspace with quadrocopter | Ars Technica

The Federal Aviation Administration has slapped a camera-equipped quadrocopter operator with a $2,200 fine after he “endangered the safety of the national airspace system” with his three-pound aircraft last September. The fine comes just a few weeks after a federal administrative judge ruled in another case that the FAA has no jurisdiction over small remote-controlled aircraft—a ruling the FAA has appealed. The fine was levied on David Zablidowsky, a 34-year old Brooklynite and bassist for the 1980s cover band Rubix Kube, who flew his camera-equipped DJI Phantom quadrocopter off of a building on East 38th Street in Manhattan on September 30, 2013. In the process, he crashed the aircraft into multiple nearby buildings before it plummeted more than 20 stories to a sidewalk below, crashing 20 feet from a pedestrian. The pedestrian then took the drone and reported the incident to police. via FAA fines ’80s band bassist for violating NYC airspace with quadrocopter | Ars Technica.

In his words: How a whitehat hacked a university and became an FBI target | Ars Technica

David Helkowski stood waiting outside a restaurant in Towson, Maryland, fresh from a visit to the unemployment office. Recently let go from his computer consulting job after engaging in some “freelance hacking” of a client’s network, Helkowski was still insistent on one point: his hack, designed to draw attention to security flaws, had been a noble act.

The FBI had a slightly different take on what happened, raiding Helkowski’s home and seizing his gear. Helkowski described the event on reddit in a thread he titled, “IamA Hacker who was Raided by the FBI and Secret Service AMAA!” Recently Ars sat down with him, hoping to get a better understanding of how this whitehat entered a world of gray. Helkowski was willing to tell practically everything—even in the middle of an ongoing investigation.

Until recently, Helkowski worked for The Canton Group, a Baltimore-based computer consulting firm serving, among other clients, the University of Maryland. Helkowski’s job title at The Canton Group was “team lead of open source solutions,” but he began to shift his concerns toward security after identifying problems on a University of Maryland server.

Read more at Ars Technica: In his words: How a whitehat hacked a university and became an FBI target | Ars Technica.

My PGP Public Key

I’ve now registered PGP keys through GPGtools for both my work and personal email addresses. If you’re trying to reach me on a sensitive topic, you can reach out to me at sean.gallagher at arstechnica dot com using the following public key to encrypt your message:

Short ID: 332B13CF
Key ID: E0A93113332B13CF
Fingerprint: 00FF E0BB B114 1A97 7A47 06F4 E0A9 3113 332B 13CF

Version: GnuPG/MacGPG2 v2.0.20 (Darwin)
Comment: GPGTools - https://gpgtools.org


NSA hacker in residence dishes on how to “hunt” system admins | Ars Technica

If you spend enough time perusing the Internet for helpful information on how to build a botnet or hack an online game, you’ll inevitably end up on a discussion board site filled with posts from various hackers eager to share that knowledge and build up their street cred. But even if you use Tor to explore the “dark Web” for such boards, you’ll never reach the 1337est board of them all—the discussion board hosted on the National Security Agency’s NSAnet.

The latest data dump from the archive of NSA webpages leaked by Edward Snowden contains a sampling of posts from the NSA’s internal hacker board by one author in particular—an NSA employee that The Intercept’s Ryan Gallagher and Peter Mass claim is the person who wrote presentations on attacking the Tor network. In one of his posts, the author outlines approaches to gaining access to networks used by individuals targeted for surveillance.


Read more of this post at Ars Technica: NSA hacker in residence dishes on how to “hunt” system admins | Ars Technica.

Posted in NSA

Sophisticated botnet steals more than $47M by infecting PCs and phones | Ars Technica

A new version of the Zeus trojan—a longtime favorite of criminals conducting online financial fraud—has been used in attacks on over 30,000 electronic banking customers in Europe, infecting both their personal computers and smartphones. The sophisticated attack is designed to circumvent banks’ use of two-factor authentication for transactions by intercepting messages sent by the bank to victims’ mobile phones.

the rest at: Sophisticated botnet steals more than $47M by infecting PCs and phones | Ars Technica.

Big Brother on a budget: How Internet surveillance got so cheap

Deep packet inspection, petabyte-scale analytics create a “CCTV for networks.”

The surveillance powers of CCTV are coming to a network near you, thanks to deep packet inspection and big data analytics.

When Libyan rebels finally wrested control of the country last year away from its mercurial dictator, they discovered the Qaddafi regime had received an unusual gift from its allies: foreign firms had supplied technology that allowed security forces to track nearly all of the online activities of the country’s 100,000 Internet users. That technology, supplied by a subsidiary of the French IT firm Bull, used a technique called deep packet inspection (DPI) to capture e-mails, chat messages, and Web visits of Libyan citizens.

The fact that the Qaddafi regime was using deep packet inspection technology wasn’t surprising. Many governments have invested heavily in packet inspection and related technologies, which allow them to build a picture of what passes through their networks and what comes in from beyond their borders. The tools secure networks from attack—and help keep tabs on citizens.

Narus, a subsidiary of Boeing, supplies “cyber analytics” to a customer base largely made up of government agencies and network carriers. Neil Harrington, the company’s director of product management for cyber analytics, said that his company’s “enterprise” customers—agencies of the US government and large telecommunications companies—are ”more interested in what’s going on inside their networks” for security reasons. But some of Narus’ other customers, like Middle Eastern governments that own their nations’ connections to the global Internet or control the companies that provide them, “are more interested in what people are doing on Facebook and Twitter.”

Surveillance perfected? Not quite, because DPI imposes its own costs. While deep packet inspection systems can be set to watch for specific patterns or triggers within network traffic, each specific condition they watch for requires more computing power—and generates far more data. So much data can be collected that the DPI systems may not be able to process it all in real time, and pulling off mass surveillance has often required nation-state budgets.

Not anymore. Thanks in part to tech developed to power giant Web search engines like Google’s—analytics and storage systems that generally get stuck with the label “big data”—”big surveillance” is now within reach even of organizations like the Olympics.

Network security camera

The tech is already helping organizations fight the ever-rising threat of hacker attacks and malware. The organizers of the London Olympic games, in an effort to prevent hackers and terrorists from using the games’ information technology for their own ends, undertook one of the most sweeping cyber-surveillance efforts ever conducted privately. In addition to the thousands of surveillance cameras that cover London, there was a massive computer security effort in the Games’ Security Operation Centers, with systems monitoring everything from network infrastructure down to point-of-sale systems and electronic door locks.

“Almost everything interesting happening in networking has some DPI embedded in it. What gets people riled up a bit is the ‘inspection’ part, because somehow inspection has negative connotations.”

The logs from those systems generated petabytes of data before the torch was extinguished. They were processed in real-time by a security information and event management (SIEM) system using “big data” analytics to look for patterns that might indicate a threat—and triggering alarms swiftly when such a threat was found.

The combination of the sophisticated analytics and massive data storage in big data systems with DPI network security technology has created what Dr. Elan Amir, CEO of Bivio Networks, calls “a security camera for your network.”

“There’s no question that within the next three to five years, not having a copy of your network data will be as strange as not having a firewall,” Amir told me.

The capability used at London’s Games doesn’t have a billion-dollar price tag. Nearly any organization on a budget can assemble something similar, in some cases with hardware already on hand and a free initial software download. And the potential applications go far beyond benign network security. With the ability to store data over long periods, companies and governments with smaller budgets could not only track what’s going on in social media, but reconstruct the communications between people over a period of months or even years, all with a single query.

“The danger here,” Electronic Frontier Foundation Technology Projects Director Peter Eckersley told Ars, “is that these technologies, which were initially developed for the purpose of finding malware, will end up being repurposed as commercial surveillance technology. You start out checking for malware, but you end up tracking people.”

Unchecked, Eckersley said, companies or rogue employees of those companies will do just that. And they could retain data indefinitely, creating a whole new level of privacy risk.

How deep packet inspection works

As we send e-mails, search the Web, and post messages and comments to blogs, we leave a digital trail. At each point where Internet communications are received and routed toward their ultimate destination, and at each server they touch, security and systems operations tools give every transactional conversation anything from a passing frisk to the equivalent of a full strip search. It all depends on the tools used and how they’re set up.

One of the key technologies that drives these tools is deep packet inspection. A capability rather than a tool itself, DPI is built into firewalls and other network devices. Deep packet inspection and packet capture technologies revolutionized network surveillance over the last decade by making it possible to grab information from network traffic in real time. DPI makes it possible for companies to put tight limits on what their employees (and, in some cases, customers) can do from within their networks. The technology can also log network traffic that matches rules set up on network security hardware— rules based on the network addresses that the traffic is going to, the type of traffic itself, or even keywords and patterns within its contents.

“Almost everything interesting happening in networking, especially with a slant toward cyber security, has some DPI embedded in it, even if people aren’t calling it that,” said Bivio’s Amir. “It’s a technology and a discipline that captures all of the processing and network activity that’s getting done on network traffic outside of the standard networking elements of packets—the addressing and routing fields. What gets people riled up a bit is the ‘inspection’ part, because somehow inspection has negative connotations.”

To understand how DPI works, you first have to understand how data travels across networks and the Internet. Regardless of whether they’re wired or wireless, Internet-connected networks generally use Internet Protocol (IP) to handle routing data between the computers and devices attached to them. IP sends data in chunks called packets—blocks of data proceeded by handling and addressing information that lets routers and other devices on the network know where the data came from and where it’s going. That addressing information is often referred to in the networking world as Layer 3 data, a reference to its definition within the Open Systems Interconnection network model.

The OSI Layers of an Internet data packet

Layer 1 Physical The format for the transmission of data across the networking medium, defining how data gets passed across it. WiFi (802.11) is a physical layer standard.
Layer 2 Data link Within a network segment, handles the physical addressing—the media access control (MAC) addressing of devices on the network and their communication. Ethernet and Point-to-Point Protocol are data link protocols.
Layer 3 Network Handles the logical addressing and routing of data, based on soft-defined addresses. Internet Protocol headers are the Layer 3 data in a packet.
Layer 4 Transport Protocol information, such as in the Transmission Control Protocol (TCP) and the User Datagram Protocol, provides for error-checking and recovery and flow control of data.
Layer 5 Session Handles communications between applications, such as remote procedure calls, inter-process communications like “named pipes,” and TCP secure sockets (SOCKS).
Layer 6 Presentation or Syntax Data formatting, serialization, compression and encryption services, like the Multipurpose Internet Mail Extension (MIME) format.
Layer 7 Application The data sent for specific applications in formats such as HTTP for the request and delivery of Web content, File Transfer Protocol (FTP), IMAP and SMTP mail connections, and other application-specific formats.

Internet routers generally just look at Layer 3 data to determine which network path a packet gets relayed down to. Network firewalls look a little deeper into the data when making a decision about whether to let packets pass onto the networks they protect. Packet-filtering firewalls typically look at Layer 3 and Layer 4, checking what transport protocol (such as TCP or UDP) and which Internet Protocol port number they use (this is commonly associated with a specific application; port 80, for example, is usually associated with Web services).

The structure of an IP packet, and how its services match up to the OSI layers.

Application-layer firewalls, which emerged in the 1990s, look still deeper into network traffic. These set rules for network traffic based on the specific type of application the data within the packet was for. Application firewalls were the first real “deep packet inspection” devices, checking the application protocols within the packets themselves, as well as searching for patterns or keywords in the data they contain.

Traffic cops vs. traffic spies

Where DPI devices sit in the network flow varies based on their purpose. DPI-based “stateful” firewalls briefly delay, or buffer, packets to check the traffic stream as it passes through. Other systems designed for deeper analysis of network content tend to passively collect packet data as it streams through a network chokepoint, then send instructions to the firewall and other security appliances when they find something amiss.

The advantage that in-line DPI systems have is that holding the packets in buffer allows them to handle the packets themselves before they’re sent on their way—intercepting their content, and repackaging it, “forging” packets with new data or removing data from within packet streams before it passes, altering data in flight. Spam-blocking firewalls, for example, use DPI to identify inbound e-mail message streams and check their headers and content for known spammers, viruses, phishing attacks, and other potentially harmful content. The firewall then reroutes those messages to quarantine or remove attachments entirely.

Web-filtering firewalls check outbound and inbound Web traffic for visits to sites that violate certain policies, or watch for Web-based malware attacks. Bivio’s Network Content Control System, for example, uses in-line DPI to allow network customers to set “parental controls” on their Internet traffic—evaluating the domains of websites as well as the content itself for adult or objectionable content within social networking sites and blogs. The network “pharms” attacks that use malicious DNS servers to hijack Web requests to another server (such as those attacked by the DNSChanger botnet).

Enlarge / The Bivio Networks NCCS appliance

Others go further, using their role at the edge of an enterprise network as a proxy for network clients to decrypting Secure Socket Layer (SSL) content in Web sessions, essentially executing a “man in the middle” attack on their users. Barracuda Networks, for example, recently introduced a new version of its firewall firmware that adds new social network monitoring features that can decrypt SSL traffic to Facebook and other social networking services, then check the content of traffic for policy violations (including playing Facebook games during work hours).

Companies want these capabilities for a variety of reasons that fall loosely under “security”—including compliance with “e-discovery” requirements and preventing confidential data loss. But those capabilities also can be used for more wide-ranging monitoring of network users. For example, 13 ofBlue Coat’s application firewalls were illegally transferred to Syria by way of a distributor in Dubai. The Web-filtering capabilities were allegedly used by the Syrian government to identify bloggers and Facebook users that expressed anti-government views within the country.

The privacy risks created by corporate use of these systems is significantly larger than that posed by government surveillance in the US, the EFF’s Eckersley said. “The systems that Barracuda and other companies are building are ripe for abuse. They have a small and debatable range of legitimate uses, and a large number of potentially illegitimate uses.” The ability to essentially run “man-in-the-middle” attacks on a large scale against employees and customers that these tools provide, he said, creates the risk of the data being abused by the company or IT staff.

DPI applications go far beyond simply enforcing policy. Once network operators started using DPI-based systems for security, other applications outside of security became possible as well. “The first one outside of the security market to use DPI was the (network) traffic management space,” said Bivio’s Amir. Companies such as Sandvine and Procera Networks built network traffic management systems that used DPI to improve overall network performance by giving priority to specific types of network traffic, performing “traffic shaping” or “packet shaping” to throttle bandwidth for some applications while giving priority to others.

“There’s no limit to the data you can extract from the payload,” said Amir. “But there’s a tradeoff of how much data you’re going to extract with how much storage capacity that’s going to take.”

“We can do a better job with network quality of service if the QOS is based on applications, and maybe subscribers, and use information that’s in the data flow already, but not if you just looked at IP addresses,” Amir explained.

Information discovered within the application data in packets also could be used by ISPs to do other things, such as targeted advertising. One failed DPI-based effort comes from a company called NebuAd. It tried to sell ISPs on this advertising idea, signing up Charter Communications and some smaller providers for a trial of a service that not only monitored the content of users’ Web traffic to target ads, but even injected data into packets, adding JavaScript that dropped tracking cookies into users browsers to do even more thorough behavior-based targeting of advertisements. NebuAd went bankrupt after it drew the attention of Congress, and Charter and the other ISPs in the trial dropped the “enhanced online advertising service” the company provided.

Other behavior-based marketing companies, such as Phorm, continue to offer “Web personalization” services that include discovery of users’ interests integrated with DPI-based Web security to block malicious sites. Another firm, Global File Registry, aims to go further, by injecting ISPs’ own advertisements into search-engine results through DPI and packet forging. The company has combined file-recognition technology from Kazaa with DPI to make it possible for ISPs to re-route links to pirated files online to sites offering to sell licensed versions of them.

Comcast has already tested the anti-piracy waters with DPI, running afoul of the FCC’s efforts to enforce network neutrality. The company’s ISP business, which uses Sandvine’s DPI technology, moved to block peer-to-peer file sharers using BitTorrent as part of its traffic management. The FCC ordered Comcast to stop (primarily because Comcast was injecting forged packets into network traffic to shut down BitTorrent sessions), but that order was later struck down by a Federal appeals court.

But these systems were designed for making quick decisions about traffic. And while they generally have reporting features that can give security managers and analysts insight into what traffic (and which user) has violated a particular policy, there’s a limit to how much information about that traffic they can capture effectively.

Drinking from the fire hose

On the other end of the spectrum is packet capture technology, which monitors the traffic passing through a network interface and records all of it to disk storage for forensic analysis. When analyzed with the right tools, packet capture tools such as the DeepSee appliances from Solera Networks can allow for security analysts to reconstruct the entirety of transactions between two systems across the Internet gateway at sustained rates of five gigabits per second and peaks in traffic up to 10 gigabits per second. That adds up to daily data captures of about 54 terabytes. Even at Solera’s advertised compression ration of 10:1 in its new Solera DB storage architecture, the cost of storing all that data, especially for larger networks, quickly adds up.

The application data highlighted in a raw packet from a Web “get” request captured by NetShark, a packet capture tool.
Sean Gallagher

Packet capture is “valuable, but it’s limited,” Amir said. “You can’t record the whole Internet; you can’t record things in an unlimited fashion and expect to have anything meaningful to go back to. That’s a short-term solution for smaller networks. What if there was a breach that you discover three or four months later? How do you go back and see what happened on your network? That technology has not been developed until very recently.”

That technology is actually a synthesis of two. The first is DPI-based network monitoring systems that pre-process network data—capturing and storing not entire packets, but selective metadata from them and their aggregated application data such as e-mail attachments, instant messages, and social media posts.

“There’s no limit to the data you can extract from the payload, as long as you understand the payload,” said Amir. “But there’s a tradeoff of how much data you’re going to extract with how much storage capacity that’s going to take. If you go too deep, you’re sliding toward the packet capture realm. If you extract too little, you’re essentially back to IP logs which aren’t terribly useful.”

NarusInsight, Narus’ DPI-based network monitoring and capture tool, is designed to find a balance to that equation. It uses a network probe device called Intelligent Traffic Analyzer, which gets “tapped” into a network choke point. “There are usually six to 14 tap points in an enterprise network” belonging to customers of the scale Narus usually deals with, said Narus’ Harrington, “usually at the uplinks to the network backbone.”

Instead of grabbing everything that passes, the ITA watches for anomalies in traffic and aggregates packets into two kinds of “vectors” for each session: a human-readable transcript of all the packets in a particular connection, and an aggregation of all the application data that was sent in that session.

Narus’ ITAs support network taps of 100 megabits to 10 gigabits per second speeds in full duplex, meaning they could face traffic rates up to 20 gigabits per second. The amount of that data that can be captured and processed “all depends on processing that needs to be done,” Harrington said, and that depends on how many parameters (or “tag pairs”) the system is configured to detect.

“Typically with a 10 gigabit Ethernet interface, we would see a throughput rate of up to 12 gigabits per second with everything turned on. So out of the possible 20 gigabits, we see about 12. If we turn off tag pairs that we’re not interested in, we can make it more efficient.”

The data from the ITA is then sent using a proprietary messaging protocol to a collection of logic servers—virtual machines running in rackmounted Dell server hardware that further aggregate and process the data. A single Narus ITA can process the full contents of 1.5 gigabytes worth of packet data per second—5400 gigabytes per hour, or 129.6 terabytes per day per network tap. By the time the data is processed into aggregated results by the logic servers, petabytes of daily raw network traffic have been reduced down to gigabytes of tabular data and captured application data.

But as impressive as the analytical power of a NarusInsight environment is, there are still limits to the type of analysis that can be done by pattern matching in a small window of data. Unknown threats to security—”zero day” exploits for which there are no known signatures that evade statistical analysis by disguising themselves as legitimate network traffic—could slip by DPI tools by themselves. This can happen even if there are signs elsewhere in IT systems, such as server system logs and system auditing tools, that something is amiss.

For Narus, users, that typically means exporting the data out of NarusInsight’s analytical environment to another tool for forensics investigation and other deeper analysis. This could be a data warehouse or a “big data” analytical database like Palantir, Hadoop-based systems like Cloudera and Hortonworks, or Splunk. Narus recently announced a partnership with Teradata to provide for large-scale analytics of NarusInsight’s output, using Tableau’s visualization software and analytical SQL queries as a front end for analysts.

“We provide our customers with a starting kit, a common dashboard” for analysis, Harrington said. From there, they can summarize and aggregate the information from the various log data, which are stored in Teradata’s multidimensional data warehouse format. And, he added, Narus is working on a Hadoop-based analytical tool using MapReduce processes to dig even further into network traffic patterns.

But other players in the network security market are moving to put the analytical power of big data systems at the center of their network monitoring solutions, rather than as an add-on. Big-volume, high-speed data storage and management technologies like Hadoop grew out of the needs of “hyperscale” Web services such as Google. By harnessing this power, data analysis software from Splunk and LogRhythm or integrated solutions such as Bivio’s NetFalcon make it possible to throw much deeper analytical horsepower at DPI data and aggregate it with other sources, both in real time and as part of long-ranging forensic analysis.

Enlarge / LogRhythm’s customizable analytics dashboard provides a tailored view to DPI and other data with visualization in real time.

Google-sized surveillance

NetFalcon launched as a product just over a year ago. It uses a columnar database format similar to Google’s BigTable and Teradata’s Aster database systems as its data store, and can perform both real-time and after-the-fact analysis on data picked up by its network probes. Each probe can handle up to 10 gigabits per second, and the “correlation engine” that takes in all of the inputs can pull in over 100 gigabits per second for processing. NetFalcon’s “retention server” database takes inputs not only from the system’s network probes, but also pulls in feeds from external log sources, Simple Network Management Protocol “trap” events, and other databases. It correlates all the traffic and event data for weeks or even months. “Hundreds of terabytes or petabytes of data, but laid out in such a way that you can do queries and searches very rapidly,” Amir said.

In an enterprise environment, Bivio could store months of data from these sources; in law enforcement applications, that data could scale to years. “We’re not storing the network data, we’re classifying it, categorizing it, breaking it up into its constituent pieces based on DPI, preprocessing it, and correlating it with external events,” Amir explained. The sources of information that could be pulled into NetFalcon’s database extend beyond the typical IT sources. “You could correlate info you’re getting over a mobile network along with geolocation data,” he continued. “Then when you’re doing the analytics, have the data right there and take advantage of it.” Some of the potential uses include correlating physical devices with online accounts to uncover individuals’ online identities, and establishing the connections between individuals by mapping their network interactions.

Splunk allows organizations to do the same sort of fused analysis, taking in data generated from an organizations’ existing DPI-powered systems and combining it with server logs or just about any other machine or human generated data that an organization would want to pull in. Splunk is designed to be able to process large quantities of raw ASCII data from nearly any source, applying MapReduce functions to the contents to extract fields from the raw data, index it, and perform analytical and statistical queries. Director of Marketing for Security and Compliance Mark Seward, described it to Ars as “Google meets Excel.”

Splunk can also distribute its flat-file databases across multiple file stores. The store for a particular application “can be a 10 terabyte flat file distributed across multiple offices around the globe,” Seward said. “When you search Splunk from a search head, it doesn’t care where the data is. It sees it all as virtual flat file.”

While Splunk is a general-purpose analytics system, there are enterprise security and forensics dashboards that have been prebuilt for it, and there’s an existing marketplace of analytics applications that can be put on top of the system to do different sorts of analysis. “We have a site called Splunkbase that has over 300 apps,” Seward said, “about 40 of which are security apps written by our engineers or by customers. A couple [apps] are integrations with Solera and NetWitness.” Even raw packet data can be dumped in ASCII into Splunk in real time and time-indexed, if someone wants to go to that level of detail.

The addition of log and other data from the network is essential to catching security problems caused by things like an employee bringing a device to work that has been infected by malware or otherwise been exploited, Seward said. “What security analysts are finding is that the security architecture of the enterprise gets bypassed when you have people bring their own device to work. Those can get spearphished, or get malware, and when they come in they can allow attacks in that bypass half the gear you have to detect intrusions. Malware does its thing behind your credentials.”

Having access to authentication data for users, and combining it with location information—such as when they’ve used an electronic key card to enter or leave a building, or when they log into various applications—allows systems like Splunk and NetFalcon to find a baseline pattern in people’s behavior and watch for unusual activities. “You have to think like a criminal,” Seward said, “and monitor for credentialed activities that, looked at in a time-indexed pattern, look odd.”

Enlarge / A sample Splunk dashboard. Splunk’s analytics language and a collection of application patterns allow users to build their own security monitoring dashboards, providing real-time visualization as well as historical analysis.

One reason why companies are increasingly interested in tools like NetFalcon and Splunk is for “data loss prevention”—blocking leaks of sensitive corporate data via e-mail, social media, and instant messaging, or the wholesale theft of data by hackers and malware using encrypted and anonymized channels.

“TOR is a good example,” Amir said. “Things like onion routers are sophisticated tools designed exactly to circumvent real-time mechanisms that would block that sort of traffic.” Analysts and administrators could search for traffic going to known onion router endpoints, and follow the trail within their own networks back to the originating systems.

Because these systems have a long memory, they’re able to catch patterns over longer periods of time and spot them instantly when they occur again, acting on them automatically. Both NetFalcon and Splunk are capable of launching automated responses to what gets discovered in data. In Splunk, the events are launched by continuous real-time searches of data as it’s streamed. NetFalcon’s “triggering” works in a similar way, as NetFalcon’s correlation engine processes incoming packet data, or when patterns are found when running an analytical query. Those actions could be sending configuration changes to a firewall, changing the settings on network capture devices, or sending an alert to an administrator about a problem.

With the help of Splunk, security can automatically respond to any suspicious network activity at the University of Scranton’s Weinberg Memorial Library.

Security on a budget

NetFalcon is targeted at very specific audiences: law enforcement agencies, telecom carriers and large ISPS, and very large companies in heavily regulated or secretive industries willing to pay for what amounts to an intelligence community grade solution. But for other organizations that already have application firewalls, intrusion detection systems or other DPI systems installed, there may not be a budget or need for Bivio’s type of technology. Take, for example, the University of Scranton, which uses Splunk to drive its information security operations.

Unlike NetFalcon, Splunk “is a huge database, but it doesn’t come with preconfigured alerts,” said Anthony Maszeroski, Information Security Manager at the University of Scranton (located in Scranton, Pennsylvania). The university has about 5,200 students—about half of whom live on campus—and has turned Splunk into the hub of its network security operations, using it to automate a large percentage of its responses to emerging threats.

Maszeroski said the IT department at Scranton pulls in data from a variety of systems. The campus’ wireless and wired routers send logs for Dynamic Host Configuration Protocol and Network Address Translation events to Splunk, which includes the physical MAC address of the devices connecting with a timestamp. This allows administrators to search the database by device address and follow where they’ve connected from on campus. The database also pulls in information on outbound DNS queries and other types of application traffic, enterprise system logs, and events from the University’s intrusion prevention system. The Splunk database of the University of Scranton Information Security Office is “close to a terabyte” in size, Maszeroski said, and “our standard op procedure is to throw everything away after 90 days. We’re also limited by budget and storage capacity.”

“Our advice is not to work for employers who demand to survey you in the office.”

One frequent activity that Splunk has helped the University automate is processing Digital Millennium Copyright Act takedown notices after a student is discovered hosting pirated content on sites hosted from their own computers or over BitTorrent streams. “We needed an automated, instant way of locking those down,” Maszeroski said. Data brought into Splunk can be used to perform a search for BitTorrent traffic and allows it to be identified by MAC address; the University’s information security office has built a Java application that uses Splunk’s Web API to find the offending MAC address and then “cut the person off at a switch or wireless level.”

DHCP data can be used to track down where offending devices are. And the DHCP log data allows the information security office to help the University’s public safety department look for stolen assets. When someone reports a stolen laptop or tablet, the office can do a quick search to see where it has been on the campus network and if it’s still connected.

Splunk’s dash also makes it easier to pick up on things that fall outside the norm. “We can do a statistical look at logs to see if an account is sending too much e-mail to check for compromised Web mail accounts,” Maszeroski said. “Also, it’s very unusual for someone to be logging into our Web server from Nigeria. We can look for multiple usernames logging in from one IP address, or look for one logging in from different geographic areas.” The same goes for the University’s VPNs.

“If there’s an event we’re absolutely certain is an indication of badness, we can programmatically run a script within a minute to cut off IP address at our network perimeter.”

Absolute power

Yes, these capabilities make it possible for organizations to both prevent security breaches and track down the reasons for the ones that slip by. But the ability to survey almost any kind of network traffic and combine it in real-time with location-based data (plus other physical world information) then store it indefinitely is a huge privacy concern. Even without logging on, individuals can leave patterns identifying themselves in their digital footprints that could be used by others for less-than-ethical purposes, said EFF’s Eckersley.

“If you’re in the habit of loading a few particular blogs,” Eckersley said, “that pattern will be repeated whether you’re in the office or at home. If networks end up with extensively deployed pattern recognition systems, users are going to need very strong assurances that the data isn’t being kept. And it’s going to be difficult for companies to give that sort of assurance, because the tendency is to keep everything. Our advice is not to work for employers who demand to survey you in the office.”

And companies in some parts of the world, including ISPs, may soon find themselves being asked to keep everything. In the UK, for example, a proposed law announced in the Queen’s Speech in Aprilwould require ISPs and others to retain metadata obtained from deep packet inspection for digital communications—e-mails, text messages, instant messages and webpage visits, among other things—for up to a year.

In the US, Senator Joe Lieberman’s Cybersecurity Act of 2012 would have pushed for larger use of systems like NetFalcon and other DPI-based systems that provide “continuous monitoring” within government. It would have explicitly given private network operators the go-ahead “notwithstanding the… Foreign Intelligence Surveillance Act of 1978… and the Communications Act of 1934″ to survey their networks and share information collected that might have some bearing on cybersecurity with the Department of Homeland Security and other agencies. The bill was filibustered by Republicans because of regulations it put on industry, but parts of the bill may be pushed forward by the Obama administration as part of an executive order.

Perhaps the proliferation of such surveillance is inevitable—it is what allowed the Olympics to proceed without any major incident, after all. And certainly, the use of big data analytics would be an improvement on some of the electronic intelligence systems currently used by US agencies, considering the recent revelations about the sad state of the FBI’s management of surveillance data. But the fact remains that these systems, as automated as they are, are only as good as the people who use them—both in terms of performance and privacy.