Bulbous, Not Tapered

Capacity Planning for Snort IDS

Sun Aug 14, 2011 by mike in geekery network-security, performance, snort

Snort is a very capable network intrusion detection system, but planning a first-time hardware purchase can be difficult. It requires fairly deep knowledge of x86 server performance, network usage patterns at your site, along with some snort-specific knowledge. Documentation is poor, current planning guides tend to focus on one or two factors in depth without addressing other broad issues that can cause serious performance problems. This post aims to be a comprehensive but high-level overview of the issues that must be considered when sizing a medium to large snort deployment.

A Note About Small Sites

Small snort-deployments don’t require much planning. Almost any system or virtual-machine will suffice to experiment with Snort on a DSL or cable internet connection with a bandwidth of 5-10Mbits/sec, just jump right in. If you need to monitor 50-100Mbits/sec of network traffic, or 5-10Gbits/sec of network traffic, then this guide can help you size you size your sensor hardware.

Know Your Site

It helps to know a few things about your site before you start planning.

What link(s) do you intend to monitor?

The most common way to get started is to monitor your internet link(s). Many organizations also expand to monitor some internal links: data-center routers, site-to-site links, or networks with VIP workstations. Unless you know what you’re doing, I suggest starting with your internet links and expanding once you’ve got that performing well. There are generally far fewer internet links to consider, and they are often much lower bandwidth than internal links which can make your first deployment simpler.

Link Locations

Life is simple if you have a single internet connection at a single site. If your network is more complicated then you’ll need to work with the team that manages your routers. They can help you figure out how many locations will need to get a sensor and how many capture interfaces each of those sensors will need to monitor the links at that site.

How much traffic do you need to monitor?

The single biggest factor when sizing your snort hardware is the amount of traffic that it must monitor. The values to consider are the maximum burst speed of each link, and also its average daily peak. It’s common to have burst capacity well in excess of actual usage and when you design your sensors you must decide what traffic level you’re going to plan for. Planning for the burst value ensures that you won’t drop packets even in a worst-case scenario, but may be much more expensive than planning for the average daily peak.

For example, it’s common to contract with an ISP for 100Mbits/sec of bandwidth that is delivered over a 1000Mbits/sec link. The average daily peak for such a link may be 60Mbits/sec, but on rare occasions it may burst up to the full 1000Mbits/sec for short durations. A sensor designed for the relatively small amount of daily peak traffic is inexpensive and simple to manage, but may drop 80% of packets or more during bursts.

If MRTG or Nagios graphs of router utilization are available, they can be very helpful in capacity planning.

Inline, Tap, Span, or VACL Capture

There are various ways to extract traffic for examination. Inline deployments where Snort is used as an intrusion prevention system should be treated with great caution because sizing problems and configuration issues related to Snort can cause network problems and outages for all your users. When running a detection configuration in conjunction with taps, spans, or VACL captures, Snort problems generally don’t cause user-facing network outages are a much much lower risk.

Security teams generally favor taps due to their consistent performance even when a router is overloaded, but there are successful Snort deployments that utilize all of the above methods of obtaining traffic for inspection. Ntop.org has a good document on making the tap vs span decision, and the wikipedia page on network taps provides informative background as well.

Operating System Expertise

Consider what operating systems your technical staff have expertise in. It is common to run high-performance Snort deployments on various Linux distributions or on FreeBSD. At one time, FreeBSD had a considerable performance advantage over equivalent Linux systems but it is currently possible to built a 10Gbit/sec deployment on Linux or BSD based systems using roughly equivalent hardware.

I recommend against deploying on Windows because not all Snort features are supported on that platform. Notably, shared-object rules do not function on windows as of Snort 2.9.0.5. While there are far fewer shared-object rules than normal “GID 1” rules, and they are released less frequently, they can still be a useful source of intelligence.

I also recommend against deploying Snort on *nixes other than Linux or BSD. Although Snort may work well on these platforms, the community employing them is much smaller. It will be much more difficult to find guidance on any platform-specific issues that you encounter.

It’s worth mentioning that my own experience is with high-performance Snort deployments on Linux, and parts of this post reflect that bias.

Single-Threading vs Multiple-CPUs

Snort is essentially single-threaded, which means that out of the box it doesn’t make effective use of multiple CPUs (technically there is more than one thread in a snort process, but the others are used for housekeeping tasks that don’t require much CPU power, not for scaling traffic analysis across multiple CPUs)._ _ As of August 2011, Snort on a single-CPU can be tuned to examine 200-500Mbits/sec, depending on the size of the ruleset used.

It’s possible to scale to 10Gbits/sec by running multiple copies of snort on the same computer, each using a different CPU. A multi-snort/multi-CPU configuration is quite a lot more complex to manage than a single-cpu deployment. Traffic from high-capacity links must be divided up into 200-500Mbit/sec chunks that can be examined by a single CPU, techniques to perform this load-balancing are discussed in the next section. Additionally, startup-scripts often must be customized and it can be difficult to manage multiple configuration files and log files. In spite of the management complexity, large organizations have successfully managed high performance Snort deployments this way for many years.

Suricata is a relatively new project that is well-worth keeping an eye on. It has a multi-threaded architecture that makes effective use of multiple CPUs, but is not as CPU efficient as Snort as of Suricata 1.0.0. As such, Suricata on a large multi-core system is much faster than Snort running on a single CPU, but about 4x slower than many Snort instances running on that same multi-core system. As Suricata matures, performance will improve. Additionally, managing a single Suricata instance is simpler than managing many Snort instances. Update 2013-11: Suricata seems to have addressed it’s performance issues, it can now inspect several hundred mbits/sec/core… which is on par with Snort.

Traffic Capture Frameworks

Snort is a modular system that supports many frameworks for capturing traffic, but not all of them scale equally well.

AFPACKET

The default capture framework on Linux since Snort 2.9, afpacket provides no features to load-balance traffic between multiple instances of snort running on multiple CPUs. As such, it can’t scale beyond 200-500Mbits/sec of throughput without some external technique to balance the load between several network interfaces. Even with this limitation, afpacket is the simplest and best choice for snort deployments with less than 200Mbits/sec of traffic.

Libpcap 0.9.x

The default capture framework on Linux for the Snort the 2.8.x series and prior, libpcap is very similar to afpacket from a user-perspective. It also lacks a built-in load-balancing feature, and can scale to a few hundred Mbits/sec of traffic. Consider upgrading Snort and using afpacket instead.

Libpcap >= 1.x.x

Around 1.0.0, libpcap introduced an mmapped feature designed to improve capture performance. Unfortunately the feature backfired and reduced performance due to a hard-coded buffer-size that is too small for most sites. Use afpacket instead unless you know what you’re doing.

PFRING and TNAPI/DNA

Pfring is a linux kernel-module that provides load-balancing through its ring clusters feature. It additionally supports several capture cards through its TNAPI/DNA high-performance drivers, which are available for $200-250 from the ntop store. Pfring, used in conjunction with a TNAPI-compatible network interface, is the least expensive method available to load-balance traffic to several instances of Snort running on several CPUs, and can scale to 10G on appropriate hardware.

High-Performance Capture Cards

Endace and other companies manufacture high-performance capture cards with integrated drivers that have load-balancing features. Depending on speed and features these cards can cost anywhere from $2,000-$25,000, and at the high end scale to 10Gbits/sec. Most of my high-performance Snort experience is on Endace hardware, which has its niggles but generally works very well.

Sourcefire 3D Hardware

Last, but certainly not least, Sourcefire sells snort hardware that is throughput rated and can simplify much of your planning. Managing a multi-Snort deployment is a lot of work, and Sourcefire has designed their systems to provide the power of Snort with an easy to manage interface, plus some features like RNA that are only available via Sourcefire. They’re more expensive than similar hardware to run open-source snort, but they may be more cost-effective in the long-run unless your organization has a do-it-yourself culture with time and technical expertise to tackle a complex open-source Snort deployment.

Traffic Management Techniques

The following traffic management techniques can be used in conjunction with the capture frameworks above to provide additional flexibility.

Hardware Load-Balancers

Gigamon, CPacket, and Top-Layer produce specialized network switches that can perform load-balancing to multiple network interfaces. The port-channeling feature of retired Cisco routers can be used to similar effect. These devices can be used to distribute traffic to multiple network interfaces in a single server or even to multiple servers, possibly scaling beyond 10G (I haven’t tested beyond 10G). I’ve worked with both Gigamon and Top-Layer hardware and found that they both do what they claim, although only Gigamon offers many 10Gbit/sec interfaces in one device. CPacket has been used by knowledgeable peers of mine and offers a unique feature that allows you to use any vanilla network switch to expand the port count of their load-balancer by using mac-address rewriting. These systems are fairly expensive, typically carrying 5-figure price tags, but often can be put to many uses in a large organization.

Manual Load-Balancing

Sometimes, traffic can be manually divided simply by configuring routers to send about half of your networks over one port and half over another. This “poor man’s” load-balancing can be cost-effective for links that are just a bit too large for one network interface.

Linux Bonded Interfaces

The opposite of load-balancing, if you have several low-bandwidth interfaces that you would like to inspect without the overhead of managing multiple copies of snort you can use bonding to aggregate them together as long as the total throughput isn’t more than a few hundred Mbits/sec.

Sizing Hardware

Now that you know how many locations you need to place a server at, how many links there are to monitor at each location, and what capture-frameworks can work for you, it’s time to choose your servers.

CPU

A very rough and conservative rule of thumb is that Snort running on a single CPU can examine 200Mbits/sec of traffic without dropping an appreciable number of packets. Snort can examine 500Mbits/sec of traffic or even much more on a single CPU with the right networking hardware and a very small or very well-tuned ruleset, but don’t count on achieving that kind of throughput unless you have tested and measured it in your environment. Martin has posted a more detailed CPU sizing exercise on his blog if you’d like to dig a little deeper.

Remember that snort is single-threaded. Unless you plan to use a load-balanced capture-framework, single-CPU performance is more important than number of cores. Alternately, if you know that you have lots of traffic to monitor, you’ll need a multi-core system paired with a load-balanced capture framework. Snort scales very linearly with the number of cores you throw at it, so don’t worry about diminishing returns as you add cores.

RAM

Each snort process can occupy 2Gbytes-5Gbytes of ram. How much depends on:

Traffic - The more traffic a sensor handles, the more state it must track. Stream5 can use anywhere from a few Mbytes to 1Gbyte to track TCP state.
Pattern Matcher - Some pattern matchers are very CPU efficient, and others are very memory efficient. The ac-nq matcher is the most cpu-efficient, reducing CPU usage by up to 30% over ac-split, but adding over 1Gbyte of ram usage per process. The ac-bnfa matcher is quite memory efficient, reducing ram usage by several hundred Mbytes per process, but increasing CPU usage by up to 20%.
Number of rules - The more rules that are active, the more memory the pattern matcher uses.
Preprocessor configs - The stream5 memcap is one crucial factor for controlling memory usage, but all preprocessors occupy memory and many can be configured to be conservative or resource-hungry.

A Snort process inspecting 400Mbits/sec of traffic, with 7000 active rules, using the ac-nq pattern matcher (which is memory-hungry), and a stream5 memcap of 1Gbyte uses about 4.5Gbytes of RAM. With a smaller ruleset and the ac-bnfa pattern matcher (which is memory-efficient), I’ve observed snort processes use about 2.5Gbytes of RAM.

Note that the operating system and other applications will need some RAM as well, and if you don’t have unusual needs 2G is generally plenty. A detailed discussion of RAM sizing for the database is beyond the scope of this post, but generally for a multi-snort deployment it’s worth putting the database on a separate server that has 1-4Gbytes of RAM.

Disk Capacity and I/O

Snort generates very little disk I/O when outputting unified2 logs. Similarly barnyard2 generates very little I/O when reading them. Any hard-disk configuration, even a single low-rpm disk will meet snort’s performance needs.

A detailed discussion of the database I/O needs is beyond the scope of this post. Again, most multi-snort sites should consider putting the database on a different server. I/O needs will vary depending on the alert-rate, the number of users querying the database, and the front-end used, but in general a 4-disk raid-10 will suffice even for a large multi-gigabit deployment. Small sites with only a few hundred megabits/sec of traffic could even use a single-disk if it meets their availability requirements.

Administrative Network Interface

Snort doesn’t generate a notable amount of network traffic on the administrative interface unless you’re connecting to the database over a low-bandwidth wan-link. Any network interface that is supported under Linux will suffice for even the largest 10Gbit/sec deployments.

Capture Network Interfaces

Each site has widely varying requirements for capture interfaces, so it’s difficult to make generic recommendations. Consider the following factors:

Have enough servers to put one at each site where there is a link to be monitored.
Have enough interfaces in each server to monitor the number of links at its site.
Ensure that each interface is fast enough to monitor the link assigned to it without dropping packets.
If any individual link exceeds about 200Mbits/sec, employ a capture framework that features load-balancing and select a compatible interface.

PCI Bus Speed

At multi-Gbit/sec traffic rates, it is possible to saturate the PCI Express bus. Each PCI Express 16x slot has a bandwidth of 32Gbits/sec (4Gbytes/sec), 8x slots are half that, and 4x slots are half again.Theoretically, each slot has dedicated bandwidth such that two PCI Express 16x slots should have a combined bandwidth of 64Gbits/sec, but in practice the uplink between the PCI Express bus and the main memory bus is different in each motherboard chipset and may not be fast enough to provide the full theoretical bandwidth to every slot.

Bus saturation is only a potential issue at very high traffic rates, either involving multiple 10Gbit/sec links or inspection of a single 10Gbit/sec link with multiple sensor applications. Be prepared to split sensor functionality across multiple servers if testing shows unexpected performance bottlenecks that might be related to bus saturation. Hardware load-balancers such as those sold by Gigamon can be useful to duplicate and load-balance very high traffic rates to multiple 10Gbit/sec sensors.

Putting It All Together

There are many factors to consider listed above, but 80% or more of cases fall into a few broad classes that can be summed up briefly:

One or two links, 200Mbits/sec or slower - Almost any server you buy today can handle this. Get 2-4 cores, 8 Gbytes of ram and 2-4 network interfaces of any type if you want to maximize your options.
One or two links, 200-400Mbits/sec - You should consider multi-snort load-balancing with PFRING or another suitable capture framework. If you’re going try to feed this traffic to a single-snort instance in order to avoid the maintenance overhead of multi-snort, get the highest-clocked fastest single-CPU that you can find, otherwise any system with sufficient RAM will work well.
One or two links, 500-1000Mbits/sec - You need multi-snort, consider pfring and with a TNAPI compatible network interface listed on ntop.org. You’ll need 2-4 snort processes, which means 10-20Gbytes of ram and a quad-core system.
One or two links 1-10Gbit/sec - You definitely need multi-snort with high-performance capture hardware. I’m partial to Endace, but pfring with a 10G TNAPI-compatible card should also work. You need 1-core and 4Gbytes of ram for every 250Mbits/sec of traffic that you need to inspect. Alternatively, consider a Sourcefire system. If you’re just getting started with Snort this is going to be a big project to do on your own.
Many links or greater than 10Gbit/sec traffic - Try to break the problem down into multiple instances of the above cases. A Gigamon box at each site may give you the flexibility that you need to split the problem across multiple servers effectively. You also might also need a moderately high-performance database server, properly tuned and sized.

Wrapping Up

Good luck with your new Snort server. Now go get some rules:

Emerging Threats: Excellent for detecting trojans and malware that have successfully compromised systems on your network and are “phoning home”. The ET rules are available free of charge and anyone can contribute fixes or new rules if you find a gap or problem with the ruleset.
VRT Subscriber Feed: Excellent for detection of exploits and attacks before they become compromised systems. The subscriber feed is developed and maintained by the experts on Sourcefire’s Vulnerability Research Team, and they charge $30/yr for a personal subscription or $500/yr for a business subscription.
VRT Registered Feed: The registered feed contains the same rules as the subscriber feed, but updates are released 30-days after subscribers receive them. The registered feed is a reasonable alternative for personal use, but if you’re protecting a business I recommend the subscriber feed.
ETPro: ETPro aims to supplement the ET community sigs with attack/exploit sigs similar to what the VRT provides. Pricing is $35/yr for personal use or $350/yr for businesses. I haven’t used it, though it’s on my todo list to try.

Once you’ve got things running, consider reading my slides on monitoring Snort performance with Zabbix to see how well you sized your system.

License and Feedback

If you find errors in this guide or know of additions that would improve it, leave a comment below.

Capacity Planning for Snort IDS by Mike Lococo is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Permissions beyond the scope of this license may be available at http://mikelococo.com/2011/08/snort-capacity-planning/#license.

If you’d like to reuse the contents of this post but the cc-by-sa license doesn’t work for you for some reason, I’m happy to discuss offering the contents of this guide under almost any reasonable terms at no cost to individuals and corporations alike. Whether you work for Sourcefire, the OISF, or are just another community member writing a Snort guide, I’m happy to work something out that lets you use any portion of this post you need. Leave a comment below or contact me using the information on the about page if you’d like to discuss.

Monitoring Snort Performance with Zabbix

Tue Apr 12, 2011 by mike in geekery network-security, ops, snort, system-monitoring

In January I gave a presentation to the REN-ISAC on how to monitor the performance of Snort IDS systems. It covers:

A comparison of high-performance capture-frameworks like vanilla-libpcap vs pfring vs dedicated capture cards from Endace or similar.
An overview of the perfmon preprocessor and the --enable-perfprofiling configure-option that allow snort to log useful performance metrics.
A very brief overview of Zabbix as a system-monitoring framework, followed by some worked-examples of actual snort problems that are analyzed using data collected by Zabbix.

The presentation-video is available only to REN-ISAC membership, but I’m making the slides and my notes available here. They’re a bit rough, but if I get questions in the comments here or on the snort-users mailing list I’ll try to be helpful. If there’s enough confusion based on the roughness of the slides, and enough interest to warrant it I can expand the presentation into a series of blog-posts to clarify some of the points that are unclear. In the meantime, have a look at an article by Juliet Kemp that fleshes out some of this material called Use Profiling to Improve Snort Performance.

Update 2012-02-09: The notes for slide 16 of this deck assert that the averaging period for the “Drop Rate” provided by perfmon is the lifetime of the Snort process instead of the data-collection period, and warns of the type of confusion this can cause. That note is incorrect, the Drop Rate is averaged over the data-collection period like everything else. The misinformation was from my own research, dating back several versions of Snort. It’s possible that my original analysis was mistaken, or that I noticed a bug which has since been quietly fixed.

Relaunch

Wed Oct 20, 2010 by mike in geekery maintenance, vps, wucoco

My RSS subscriber has reminded me that my feeds have gone a little bit wild over the last couple of days as I’ve gone through old posts to retag and update dead links. That’s all finished now, promise. I’ve been cleaning house and generally updating things…

I’ve migrated from Laughing Squid shared hosting to a linode virtual private server. Laughing squid has been good to me over the years, but I’ve gotten to the point where I really want a root shell on my web-server. And VPS’s have gotten cheap enough now to be a reasonable option.
I’ve retired the WuCoco Theme. It’s also had a good run, but I don’t have time or interest in bringing it up to date with the latest WordPress features. I’m now using a mildly tweaked version of Twenty Ten, which will likely continue to evolve.
General post, page, and link pruning.

This move should lay the groundwork to roll out some of the genealogy resources I’ve talked about wanting to make. When I’m done, there should be some mailing-list archives, a wiki, and maybe an image gallery… hopefully all authenticated via OpenID. We’ll see how it pans out, but preliminary tests have gone well so far.

Virtualization and Security Boundaries

Mon Jun 8, 2009 by mike in geekery, personal security, virtualization

Virtualization security is coming up frequently in higher-ed security forums as folks scramble to understand best-practices before whatever path-of-least-resistance gets too entrenched to change. Unfortunately, there’s almost no intermediate-level documents on virtualization security to help us wrap our heads around the problem. There’s plenty of introductory documents rehashing the same six bullet points over and over and there’s quite a lot of deep-dive technical material on various details, but almost no technical survey material for folks looking to bootstrap themselves on the topic.

I gave a presentation on virtualization security at the Educause Security Professionals Conference in April, and there seemed to be agreement and frustration about the lack of available survey material, which gave me the motivation I needed to polish up this paper for release. It includes a basic taxonomy of virtualization technologies for security practitioners, an overview of attacks in virtualized environments, a list of best-practices with links to more detailed documents, and identifies areas where best practices haven’t yet been established.

Read Virtualization and Security Boundaries.

Fedora 8 on a Dell Latitude D620

Mon Mar 24, 2008 by mike in geekery d620, dell, fedora, hardware compatibility, latitude, linux

Fedora 8 works quite well on the D620 right out of the box, and with a few tweaks can be just about fully supported. This guide summarizes what I’ve done to get things working to my satisfaction. It is not a step by step howto, but does attempt to link to more detailed resources when they are available. The table below shows at a glance what is and isn’t working well on my system. Most items worked immediately after install without manual intervention, italic items were made fully functional after some manual configuration, bold items have significant unsolved issues associated with them.

Dual-core Processor: Both cores are detected on the 2.17GHz Intel Core Duo processor, the 32bit i686 smp kernel is installed and just works. Dynamic CPU frequency scaling works well and if you wish to monitor/change the scaling behavior there’s a gnome panel applet to do so. I haven’t tried the 64 bit build, but most reports I’ve read indicate that it works well.
USB: Works, no config needed.
PCMCIA Slot: Works, no config needed.
Touchpad/Track Stick: Works, no config needed. Install gsynaptics from Extras if you want to customize the trackpad behavior.
Suspend to Ram: How this works will depend on the options you ordered your laptop with and the video drivers you’re using. My understanding is that it will work fine out of the box with Intel graphics or if you’re using the open source NV drivers with your NVidia card, however both of these options have fairly poor 3D performance. With some tweaking, suspend can also be made to work when using the proprietary drivers that will allow you to have strong 3D performance with one of the Quadro cards offered in the D620. Classohm.org has detailed instructions (for the D800/F7, but they apply here as well). Since the NVidia cards offered in the D620 are PCI-E and not AGP, you can skip the AGP tweaks and just create the scripts in /etc/pm/config.d/. I’ve tested suspend with kernel 2.6.23.9-85/smp.
Hibernate to Disk: Untested, please report if this works out of the box or if you have a link with instructions on any required tweaking.
Ethernet: Works, no config needed.
Wireless Networking: I didn’t have to jump through any hoops, other than to enable the network manager applet in order to avoid using iwconfig from the terminal all the time. Dawid Lorenz reports trouble that he solved by switching from the built in iwl3945 to the freshrpm’s ipw3945, which have worked well for me on previous versions of Fedora. Note that the awful Broadcom 4310 required ndis-wrapper to be supported in past versions of Fedora, I’m not certain what state it’s in today.
Bluetooth: Works, no config needed.
2D Video: Works, no config needed.
3D Acceleration: The NVidia Quadro 110M works well after installing nvidia-x11-drv from freshrpms, bumping glxgears performance from ~900fps to ~2300fps. Don’t forget to install kernel-devel for your kernel version and reboot. Note that installing the proprietary drivers will bork suspend/resume until you fix it using the instructions above.
External Monitor: If all you want is to switch to the external output instead of the internal LCD, you can do so easily right out of the box. Use the screen resolution control panel to set your resolution, and Fn-F8 to toggle between the displays. If you choose to install the NVidia driver, it includes a simple dialog for setting up multimonitor support using TwinView. TwinView isn’t perfect, windows maximize dumbly (across both displays) and if the resolutions of the two monitors are mismatched there’s an area where it’s possible to move the mouse and place windows that doesn’t show up in any monitor. All in all, it’s a bit lame but does get the job done in a pinch.
CD/DVD Burning: Works out of the box. At one time this tweak substantially improved burn speed and system responsiveness while burning, I haven’t tested to determine if it’s still needed. Post a comment if you’ve done testing.
Sound Playback: Works, no config needed.
Sound Recording: Works, no config needed. If you’re not getting recorded sound, check the Volume Control app to make sure that capture is enabled and the recording level isn’t way down.
Volume Keys: Now work out of the box. You can mess with the key bindings in System –> Preferences –> Keyboard shortcuts but they control volume without any tweaking now.
Radio On/Off Switch: Works fine, and has a noticeable effect on battery life. You may need to “up” the interface with the connection manager of your choice if you enable the radio while the system is running.
ACPI Power Management: All the power management features work (fan speed autoadjusts, cpu frequency scaling works, there’s a gnome applet to easily control it).
Fingerprint Reader: Untested.
Modem: Untested.

Output of lspci

00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML and 945GT Express Memory Controller Hub (rev 03)
00:01.0 PCI bridge: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML and 945GT Express PCI Express Root Port (rev 03)
00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 01)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 01)
00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 2 (rev 01)
00:1c.2 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 3 (rev 01)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #1 (rev 01)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #2 (rev 01)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #3 (rev 01)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #4 (rev 01)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e1)
00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge (rev 01)
00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) SATA IDE Controller (rev 01)
00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01)
00.0 VGA compatible controller: nVidia Corporation G72M [Quadro NVS 110M/GeForce Go 7300] (rev a1)
03:01.0 CardBus bridge: O2 Micro, Inc. OZ601/6912/711E0 CardBus/SmartCardBus Controller (rev 40)
09:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5752 Gigabit Ethernet PCI Express (rev 02)
0c:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG Network Connection (rev 02)[/text]

Multihop SSH with Putty/WinSCP

Thu Jan 10, 2008 by mike in geekery bouncing, chaining, putty, ssh, stacking, winscp

Introduction

It’s not always possible to ssh to a host directly. Many networks require high-value systems to be accessed via an intermediate bastion/proxy host that receives extra attention in terms of security controls and log monitoring. The added security provided by this connection bouncing comes with a cost in convenience, though. It requires multiple logins to access the protected systems and substantially complicates scp/stfp file transfers.

Fortunately, there are a number of ways to automate connection bouncing and make it as convenient as direct connection. There are already a number of web-sites detailing the approaches to this issue, and I won’t repeat their contents, to get a broad overview of the topic read the following:

SSH Bouncing Part 1 from Hacking Linux Exposed
SSH Bouncing Part 2 from Hacking Linux Exposed
Using rsync through a firewall, the concepts in this document apply to all SSH setups, not just those integrating rsync

Terminology

SSH Connection “Chaining”

Connection “chaining” refers to any approach that involves sshing to an intermediate host, and then sshing from the intermediate host to the next host (for example: ssh 1 ‘ssh 2 “ssh 3”’). This solution is attractive for setups with many hops because it’s easy to extend, for example sshto makes this very easy. The primary disadvantage is that end-to-end encryption is lost. The connection is decrypted by every host in the chain, and an attacker with sufficient privilege on an intermediate system can sniff the connection without compromising either of the endpoints. I consider this to be a significant failing, and have a strong preference for “stacked” connections wherever they are logistically feasible.

SSH Connection “Stacking”

Connection “stacking” refers to any solution that involves tunneling ssh connections inside each other. “Nesting” strikes me as a better term, but stacking seems to be more widely agreed upon. It is typically implemented with proxy-commands or with ssh port-forwarding. It can be more difficult to manage for connections with many hops, and it forces one of the endpoints to bear the encryption load of all the connections (in chained setups, the load is spread evenly among all the hosts in the chain). It does maintain end-to-end encryption, preventing connection/credential sniffing by intermediate hosts.

My Setup

The key properties for my setup are:

End-to-end encryption is maintained using stacked connections
For putty, multiple hops are supported via saved sessions. For WinSCP, only a single hop is supported.
Putty is used for shell connections, and WinSCP is used for scp/sftp connections
No special software is required beyond a default Putty installation, WinSCP, and an SSH server with port forwarding enabled. Specifically, netcat is not required on the intermediate host as is common with ProxyCommand setups.

WinSCP Config

The WinSCP Config is quite simple and utilizes its “tunnel” feature. Open WinSCP and configure a saved session for the final destination host as follows:

On the Session page, fill in the hostname and user name for the final destination host. Leave the password blank.
Check the “Advanced options” box in the login dialog.
Select the Connection –> Tunnel page.
Check the “Connect through SSH tunnel” box.
Fill in the Host name and user name of the intermediate host. Leave the password blank.
Save the session using the button in the lower right-hand corner of the window.

When you log in using the new profile, you will be prompted for two passwords. The first is for your account on the intermediate host, and the second is for your account on the final-destination host. After login, the bounce is entirely transparent and WinSCP works as if you had connected directly to the final-destination host. The connection process can be made even more transparent and secure by using public key authentication with Pageant instead of passwords.

Putty Config

The Putty setup is slightly more complicated and requires that public key authentication be used on the intermediate host. It utilizes Putty’s “local proxy” feature, which allows you to specify an arbitrary command on the local machine to act as a proxy. Instead of creating a TCP connection, PuTTY will communicate using the proxy program’s standard input and output streams. Our local proxy will be plink, which is a command-line ssh connection program included in the Putty default installation. Plink’s -nc option provides functionality similar to the ProxyCommand/netcat approach, but does so using the ssh server’s native port-forwarding interface and does not require that netcat be installed on the intermediate system. To set things up, configure a saved session for the final destination host:

Configure public key authentication for the intermediate host and make sure it works.
Start putty and on the “Session” page of the “Putty Configuration Dialog” that appears, fill in the host name and user name for the final destination host.
Switch to the Connection –> Proxy page, select “Local” as the proxy type enter the following as the local proxy command: shell plink.exe intermediate.proxy.host -l username -agent -nc %host:%port
Save the session.

Remember to replace “intermediate.proxy.host” with your intermediate hostname and “username” with your real username. “%host” and “%port” are variables that putty will feed to plink at connection time, so leave those as is. If all is working properly, when you log in using the new profile plink will handle the login to the intermediate system silently. Putty isn’t smart enough to prompt if the proxy command requires user input, so you’ll get a connection error if public key authentication on the intermediate host isn’t working. If you use password authentication on the final destination host you’ll be prompted for your password, or if you use pubkey authentication there as well you’ll land at a prompt with no fuss at all.

If you have trouble, make sure plink is executing properly. You may need to enter the full pathname, usually c:\program files\putty\plink.exe. You can also try executing the plink command from a prompt, remembering to substitute the %host and %port values of your final destination host. If it’s working properly, you’ll be logged into the intermediate system with your pageant-cached private key, and instead of a prompt you’ll be presented with the SSH banner for your final destination system.

Before attempting multiple hops, get a single-hop setup up and running. Additional hops simply extend the same concepts using saved sessions. By way of example, assume that you have 2 hops on the way to a server: PuttyClient -> IntermediateServer01 -> IntermediateServer02 -> FinalServer

Create a saved session for IntermediateServer01 with no proxy config, since it’s directly accessible from your PuttyClient workstation.
Create a saved session for IntermediateServer02 with a local proxy command of plink -load server01 -nc %host:%port
Create a saved session for FinalServer with a local proxy command of plink -load server02 -nc %host:%port. When you attempt to SSH to FinalServer with this saved profile, plink will try to setup the proxy using the profile for server02 which has a local proxy command that calls back to the server01 config. This technique can be extended to any number of hops, thanks to Tim for pointing it out in the comments.