<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
"DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

<head>
<meta name="dc:creator" content="J?rn Nettingsmeier 
&lt;nettings@folkwang-hochschule.de&gt;"/><meta name="dc:publisher" 
content="University of Duisburg-Essen, Dept. of Computer Science"/><meta 
name="dc:subject" content="Introduction to Internet Worms"/><meta 
name="dc:description" content="Course presentation on computer worms, and their 
replication mechanisms"/><meta name="dc:date" content="2004-03-23"/><meta 
name="dc:type" content="Collection"/><meta name="dc:format" content="text/xml"/>
<meta name="dc:identifier" 
content="http://spunk.dnsalias.org/public_stuff/cs_papers/Worms/"/><meta 
name="dc:language" content="en"/><meta name="dc:relation" content="Collection"/>
<meta name="dc:rights" content="(c) 2004 J?rn Nettingsmeier - may be freely 
redistributed and modified. Credit is welcome :)"/>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />

<link rel="stylesheet" type="text/css" href="worms.css" />

<title>Introduction to Internet Worms</title>

</head>

<body>

<div class="nav">
<a href="worms-1.xml">Previous</a> |
<a href="worms.xml">Table of Contents</a> |
<a href="worms-3.xml">Next</a>
</div>


<h1>Worm Design</h1>

<div>
<p>
In the following section, we discuss some design goals a worm writer might 
consider. These goals are not completely orthogonal - some work hand in hand, 
but some directly contradict others. Therefore, worm design includes a 
number of trade-offs, notably the already mentioned "speed 
vs. stealth", as well as "size vs. complexity".
</p>
<p>
Afterwards, we take a detailed look at scanning algorithms. Past worm epidemics 
have shown that they are the most crucial factor of a worm's success.
Contrary to the exploit, which is mostly predetermined by a flaw outside of the 
worm writer's control, the choice of scanning method is entirely in his or her 
hand and leaves the most room for improvement and variation.
</p>
</div>

<h2>Design goals</h2>

<h3>Saturation</h3>
<div>
<p>
Virtually all worms strive for <span class="keyword">saturation</span>, i.e. the 
infection of all vulnerable hosts on the entire internet. The only exception to 
this rule are targetted worm attacks (see "Controllability" below).<br />
To achieve saturation, the worm must make sure that its scanning method 
eventually probes all potential targets, and must be tolerant to the loss of 
worm instances due to detection, hardware failure or network outages.
</p>
</div>

<h3>Stealth</h3>
<div>
<p>
A worm writer can choose to design for stealth in the hope that the worm will be 
detected late or not at all. S/he does this by slow, irregular scanning, and by 
keeping the worm code as small as possible, hoping that the scannning and 
attacking activity will be masked to human eyes by legitimate traffic.<br />
The problem is that once the worm's <span class="keyword">signature</span> (a 
string identifying the worm that can be detected and filtered by firewalls) is 
known and filters are in effect, the game is basically over.<br />
To work around this, the worm could be encrypted or made polymorphic, but both 
approaches are very hard to implement, bloat the code and will likely raise CPU 
usage a lot, again decreasing stealth.</p>
</div>

<h3>Speed</h3>
<div>
<p>
Instead of stealth, most worms seen in the wild are optimized for speed. They 
try to scan aggressively and as fast as possible, assuming that initially no 
automatic filtering is possible, since the worm signature has not yet 
been determined. By the time filters are in effect (which normally requires 
human intervention), the worm hopes to have reached saturation.
</p>
<p>
Since the estimated average reaction time of network administrators to 
manually deploy filters is 24 hours at best, this assumption proves true in most 
cases.
</p>
<p>
It is important to consider the limiting factors of speed: CPU usage on the 
infected host, bandwidth consumption and efficiency of scanning. Ideally, each 
worm instance would emit scans at 100 per cent of the bandwidth available to the 
infected host (only possible if the CPU is not fully loaded) and at the same 
time avoid duplication of work between instances, i.e. not scan and attack hosts 
that other instances are working on.
</p>
</div>

<h3>Size</h3>
<div>
<p>
Keeping the worm code small is obviously beneficial to almost all other design 
goals, but it usually prevents the worm from using advanced techniques such as 
polymorphism or encryption, or packing hit lists.
</p>
<p>
One approach to reduce code size is to download the scanning code and/or hit 
lists after the infection. This is very hard to do in practice, 
since the code cannot simply be pulled from one specific compromised website. 
Apart from possibly providing clues as to the identity of worm originator, such 
a site would comprise a single point of failure for the worm.<br />
The only viable solution is to create a fault-tolerant and untraceable network 
of download sites and somehow dissipate access information among the worm 
instances, which is just as hard as it sounds.
</p>
</div>


<h3>Platform independence</h3>
<div>
<p>
Since modern server software is deployed on a wide range of hardware and 
operating systems, it is very attractive for a worm writer to find an 
exploit on a high enough level of abstraction as to be platform-independent.<br 
/>
Luckily, this is very rare. The exploit tactic of choice, the buffer 
overflow attack, requires architecture-specific code and is very sensitive to 
memory layout, which can vary even on machines of the same architecture. Some 
CPUs (notably the SPARC with its non-excutable stack) are more-or-less immune to 
simple buffer overflows anyway and require more advanced exploiting 
techniques.<br />
A cross-platform exploit usually requires a more high-level and 
therefore more obvious security hole.
</p>
</div>

<h3>Controllability</h3>
<div>
<p>
With all the recent propaganda about "cyber-terrorism", "information 
warfare" and other spectres of American homeland security angst, it seems 
worthwile to consider "surgical" worms that allow for precise 
targetting in advance, and show predictable behaviour, to be used as weapons.
</p>
<p>
Depending on the scanning method and the nature of the target, it can be 
relatively easy to limit the worm's propagation, especially if the target is 
on one contiguous subnet. Thus an attack against a particular university or 
business network would be feasible.<br />
We have also seen worms with a built-in "best-before" date, after which they 
ceased to operate (cf. Code Red II), or back-doored hosts that sleep and 
wait for new orders from the worm originator. Such mechanisms might be employed 
in information warfare, to allow for "time-bombs" or subvert target hosts into 
remote controlled agents.
</p>
<p>
However, experience with so-called "benevolent" or patch worms that attempt 
to fix and automatically patch security holes (which, since it happens without 
the consent of the machine owner, is also illegal and an act of computer 
sabotage in most countries) has shown that due to programming mistakes and 
incorrect assumptions, those worms can easily get out of hand and do more damage 
than good due to unwanted side-effects.<br />
Malevolent worms are subject to the same imponderabilities.
</p>
<p>
Since worms require security flaws to be present (a factor which 
we, the potential victims, can control to a great degree, at least if we are 
using open-source software), the rhetoric about cyber-terrorism threats is 
easily identified as yet another attempt to justify further restrictions of 
individual freedom rights. So far, the worst that has happened to us is a day 
off the net, or three days without electricity. No lives have been lost yet due 
to worms or computer sabotage, and if they are, it's more a sign of design 
mistakes in the underlying system than of the potential of cyber terrorism.<br 
/>
On the other hand, the economic consequences of targetted attacks are obvious.
</p>
</div>

<h2>Scanning algorithms</h2>

<h3>Random scanning</h3>
<div>
<p>
The most basic (and obvious) scanning method is to randomly scan the entire IP 
address range (which has 2^32 host adresses). All hosts will eventually be 
scanned and the algorithm is simple, compact and very tolerant to the loss of 
worm instances. But there is no cooperation between worm instances, so many 
hosts will be scanned more than once.<br />
Random scanning worms show pandemic spread.
</p>
<p>
If vulnerable hosts are scarce on the network, the approach is very ineffective. 
After the wide-spread deployment of IPv6 with its address range of 2^128, random 
scanning will quickly disappear, but in the current situation with a rather 
small range with many similar vulnerable machines (thanks to the omnipresence of 
Windows), it remains the method of choice.
</p>
</div>

<h3>Topological scanning</h3>
<div>
<p>
Topological scanning relies on information gathered from the infected host. Say 
we have exploited a hole in a popular SMTP server. We can then parse the 
logs on that host for other SMTP servers it has interacted 
with and attack those directly, rather than scanning at random and then testing 
whether the target actually runs SMTP.<br />
The proliferation pattern is epidemic.
</p>
<p>
Since topological scanning implies parsing of information in various formats, it 
is more complicated than any algorithmic scanning method, and the code 
will be bigger. But it is considerably more efficient, since most non-vulnerable 
hosts are ruled out in advance, and the resulting traffic will likely be 
inconspicuous. It may however fail to reach saturation, if islands of hosts 
exist that are not connected to already infected ones by topology information.
</p>
<p>The Morris Worm (cf. [Spafford1988]) had to rely on 
topology analysis, since at the time the IP address range was so scarcely 
populated that random scanning would have been a huge waste.<br />
With the advent of IPv6, topological scanning will likely supersede random 
scanning as the method of choice again.
</p>
</div>

<h3>Weighted random scanning</h3>
<div>
<p>
The weighted random approach combines some of the advantages of random and 
topological scanning: the random address generator is modified so that 
addresses on the same subnet get a higher probability. Usually, those are 
closer by in terms of router hops. Preferring them will both speed up the 
scanning and keep the load on the backbones down, reducing the risk of quick 
detection. Unless the algorithm does not traverse subnets at all, the spread 
will be pandemic. (cf. Code Red II case study below)
</p>
</div>

<h3>Hit-list scanning</h3>
<div>
<p>
To avoid the disadvantages of scanning entirely, a list of vulnerable hosts can be 
composed in advance and sent along with the worm.
The list data can be gathered surreptitiously over a long period of time, so that the scans 
will not stand out from the normal everyday portscan activity of script kiddies 
and curious netizens. When the actual attack starts, there will be no more scan 
traffic that might betray the worm, and each infection attempt will hit home.
</p>
<p>
The interesting part here is the handling of the hit list. It will be huge (a 
few hundred k at the least), and it must be divided among worm instances so 
that duplicate infection attempts are avoided. At the same time, a certain 
amount of redundancy is necessary in case a worm instance is lost and with 
it part of the hit list.
</p>
<p>
Hit list worms will spread orders of magnitude faster than normal scanning 
worms, and allow for precise targetting in advance. So far, no wide-spread hit 
list worm has been observed in the wild.
</p>
</div>

<h3>Permutation scanning</h3>
<div>
<p>
Every worm instance carries a fixed permutation sequence of all IP 
addresses. It can be easily produced algorithmically by a small 32 bit block 
cipher, so there is no need to haul along a huge text file.<br />
Scanning starts at a randomly chosen IP in this sequence and then continues 
along it. If the worm instance hits a system that has already been infected, it 
can safely assume that somebody else is already working on this section of the 
sequence, so it jumps to another random starting point.
</p>
<p>
Permutation scanning will effectively avoid redundant scanning and infection 
attempts, while providing perfect fault tolerance - as long as one worm 
instance is still active, all IPs will eventually be scanned. It is a 
theoretical concept devised by Nick Weaver (see below) and has not yet 
been encountered in the wild.</p></div>


<h2>Hypothetical worm designs</h2>

<div>
<p>
By using advanced scanning techniques and clever divide-and-conquer cooperation 
between worm instances, it is possible to construct worms that will reach 
saturation within a much shorter time than the ones we have seen yet.
</p>
<p>
One famous hypothetical design is Nick Weaver's <span class="keyword">"Warhol 
Worm"</span> [Weaver2001], named after Andy Warhol's "Fifteen minutes of fame" 
quote. In simulations, it was able to reach saturation within 15 minutes by using 
the permutation scanning method.
</p>
<p>
Stuart Staniford et al. have shown that a hit-list worm that cleverly distributes
the list among newly infected hosts would be able to reach saturation within 
tens of seconds (they call it a <span class="keyword">"flash worm"</span>), and 
that to compose such a hitlist is perfectly feasible for a determined attacker 
and can be accomplished without raising suspicion [Staniford2003-2].
</p>


</div>

<div class="nav">
<a href="worms-1.xml">Previous</a> |
<a href="worms.xml">Table of Contents</a> |
<a href="worms-3.xml">Next</a>
</div>




</body>

</html>

