- Mimimorphism: A New Approach to Binary Code Obfuscation
Zhenyu Wu, Steven Gianvecchio,
Mengjun Xie, and Haining Wang
Binary obfuscation plays an essential role in evading malware static analysis and detection.
The widely used code obfuscation techniques, such as polymorphism and metamorphism, focus on
evading syntax based detection. However, statistic test and semantic analysis techniques have
been developed to thwart their evasion attempts. More recent binary obfuscation techniques
are divided in their purposes of attacking either statistical or semantic approach, but not both.
In this paper, we introduce mimimorphism, a novel binary obfuscation technique with the potential
of evading both statistical and semantic detections. Mimimorphic malware uses instruction-syntax-aware
high-order mimic functions to transform its binary into mimicry executables that exhibit high
similarity to benign programs in terms of statistical properties and semantic characteristics.
We implement a prototype of the mimimorphic engine on the Intel x86 platform, and evaluate
its capability of evading statistical anomaly detection and semantic analysis detection techniques.
Our experimental results demonstrate that the mimicry executables are indistinguishable from
benign programs in terms of byte frequency distribution and entropy, as well as control flow fingerprint.
Appeared in ACM CCS 2010, Chicago, IL, October 2010.
- A Collaboration-based Autonomous Reputation System for Email Services
Mengjun Xie and Haining Wang
This paper presents CARE, an autonomous email reputation system based on inter-domain collaboration.
Within the framework of CARE, each domain independently builds its reputation database based on both
the local email history and the information exchanged with other collaborating domains. CARE examines
the trustworthiness of the email histories obtained from collaborators by correlating them with the
local email history. To validate the efficacy of CARE, we have analyzed real email logs, conducted
a DNS-based estimation experiment, and performed a series of simulations. Our experimental results
show that CARE can effectively improve the reliability and performance of email systems.
Appeared in IEEE INFOCOM 2010, San Diego, CA, March 2010.
- Battle of Botcraft: Fighting Bots in Online Games with Human Observational Proofs
Steven Gianvecchio, Zhenyu Wu,
Mengjun Xie, and Haining Wang
The abuse of online games by automated programs, known as game bots, for gaining unfair advantages
has plagued millions of participating players with escalating severity in recent years. The current
methods for distinguishing bots and humans are based on human interactive proofs (HIPs), such as CAPTCHAs.
However, HIP-based approaches have inherent drawbacks. In particular, they are too obtrusive to be
tolerated by human players in a gaming context. In this paper, we propose a non-interactive approach
based on human observational proofs (HOPs) for continuous game bot detection. HOPs differentiate bots
from human players by passively monitoring input actions that are difficult for current bots to perform
in a human-like manner. We collect a series of user-input traces in one of the most popular online games,
World of Warcraft. Based on the traces, we characterize the game playing behaviors of bots and humans.
Then, we develop a HOP-based game bot defense system that analyzes user-input actions with a
cascade-correlation neural network to distinguish bots from humans. The HOP system is effective in
capturing current game bots, which raises the bar against game exploits and forces a determined adversary
to build more complicated game bots for detection evasion in the future.
Appeared in ACM CCS 2009, Chicago, IL, November 2009.
- Measurement and Classification of Humans and Bots in Internet Chat
Steven Gianvecchio, Mengjun Xie,
Zhenyu Wu, and Haining Wang
The abuse of chat services by automated programs, known as chat bots, poses a serious threat to Internet
users. Chat bots target popular chat networks to distribute spam and malware. In this paper, we first
conduct a series of measurements on a large commercial chat network. Our measurements capture a total
of 14 different types of chat bots ranging from simple to advanced. Moreover, we observe that human
behavior is more complex than bot behavior. Based on the measurement study, we propose a classification
system to accurately distinguish chat bots from human users. The proposed classification system consists
of two components: (1) an entropy-based classifier and (2) a machine-learning-based classifier. The two
classifiers complement each other in chat bot detection. The entropy-based classifier is more accurate
to detect unknown chat bots, whereas the machine-learning-based classifier is faster to detect known
chat bots. Our experimental evaluation shows that the proposed classification system is highly effective
in differentiating bots from humans.
Appeared in USENIX Security 2008, San Jose, CA, July 2008.
- Swift: A Fast Dynamic Packet Filter
Zhenyu Wu, Mengjun Xie, and Haining Wang
This paper presents Swift, a packet filter for high performance packet capture on commercial
off-the-shelf hardware. The key features of Swift include (1) extremely low filter update latency for
dynamic packet filtering, and (2) Gbps high-speed packet processing. Based on complex instruction set
computer (CISC) instruction set architecture (ISA), Swift achieves the former with an instruction set
design that avoids the need for compilation and security checking, and the latter by mainly utilizing
SIMD (single instruction, multiple data). We implement Swift in the Linux 2.6 kernel for both i386
and x86_64 architectures. The Swift userspace library supports two sets of application programming
interfaces (APIs): a BPF-friendly API for backward compatibility and an object oriented API for simplifying
filter coding. We extensively evaluate the dynamic and static filtering performance of Swift on multiple
machines with different hardware setups. We compare Swift with BPF (the BSD packet filter)--the de facto
standard for packet filtering in modern operating systems--and hand-coded optimized C filters that are
used for demonstrating possible performance gains. For dynamic filtering tasks, Swift is at least three
orders of magnitude faster than BPF in terms of filter update latency. For static filtering tasks, Swift
outperforms BPF up to three times in terms of packet processing speed, and achieves much closer performance
to the optimized C filters.
Appeared in USENIX NSDI 2008, San Francisco, CA, April 2008.
- HoneyIM: Fast Detection and Suppression of Instant Messaging Malware in Enterprise-like Networks
Mengjun Xie, Zhenyu Wu, and Haining Wang
Instant messaging (IM) has been one of most frequently used malware attack vectors due to its popularity.
Distinct from other malware, it is straightforward for IM malware to find and hit the next victim by
exploiting the current victim’s contact list and playing social engineering tricks. Thus, the spread
of IM malware is much harder to detect and suppress through conventional approaches. The previous solutions
are ineffective to defend against IM malware in an enterprise-like network environment, mainly because
of high false positive rate and the requirement of the IM server being inside the protected network.
In this paper, we propose a novel IM malware detection and suppression mechanism, HoneyIM, which guarantees
almost zero false positive on detecting and blocking IM malware in an enterprise-like network. The detection
of HoneyIM is based on the concept of honeypot. HoneyIM uses decoy accounts to trap IM malware by leveraging
malware spreading characteristics. Fed with accurate detection results, the suppression of HoneyIM can
conduct a network-wide blocking. In addition, HoneyIM delivers attack information to network administrators
in real-time so that system quarantine and recovery can be quickly performed. The core design of HoneyIM is
generic, and can be applied to the scenarios that either enterprise IM services or public IM services are
used in the protected network. Based on open-source IM client Pidgin and client honeypot Capture, we build
a prototype of HoneyIM and validate its efficacy through both simulations and real experiments. Our results
show that HoneyIM provides effective protection against IM malware in enterprise-like networks.
Appeared in ACSAC 2007, Miami Beach, FL, December 2007.
- Automatic Cookie Usage Setting with CookiePicker
Chuan Yue, Mengjun Xie, and Haining Wang
HTTP cookies have been widely used for maintaining session states, personalizing, authenticating, and
tracking user behaviors. Despite their importance and usefulness, cookies have raised public concerns
on Internet privacy because they can be exploited by Web sites to track and build user profiles. In addition,
stolen cookies may also incur security problems. However, current web browsers lack secure and convenient
mechanisms for cookie management. A cookie management scheme, which is easy-to-use and has minimal privacy
risk, is in great demand; but designing such a scheme is a challenge. In this paper, we introduce CookiePicker,
a system that can automatically validate the usefulness of cookies from a Web site and set the cookie
usage permission on behalf of users. CookiePicker helps users achieve the maximumbenefit brought by cookies,
while minimizing the possible privacy and security risks. We implement CookiePicker as an extension to
Firefox Web browser, and obtain promising results in the experiments.
Appeared in IEEE DSN 2007, Edinburgh, UK, June 2007.
- An Effective Defense Against Email Spam Laundering
Mengjun Xie, Heng Yin, and Haining Wang
Laundering email spam through open-proxies or compromised PCs is a widely-used trick to conceal real spam
sources and reduce spamming cost in underground email spam industry. Spammers have been plaguing the Internet
by exploiting a large number of spam proxies. The facility of breaking spam laundering and deterring spamming
activities close to their sources, which would greatly benefit not only email users but also victim ISPs,
is in great demand but still missing. In this paper, we reveal one salient characteristic of proxy-based
spamming activities, namely packet symmetry, by analyzing protocol semantics and timing causality. Based on
the packet symmetry exhibited in spam laundering, we propose a simple and effective technique, DBSpam,
to on-line detect and break spam laundering activities inside a customer network. Monitoring the bi-directional
traffic passing through a network gateway, DBSpam utilizes a simple statistical method, Sequential Probability
Ratio Test, to detect the occurrence of spam laundering in a timely manner. To balance the goals of promptness
and accuracy, we introduce a noise-reduction technique in DBSpam, after which the laundering path can be
identified more accurately. Then, DBSpam activates its spam suppressing mechanism to break the spam laundering.
We implement a prototype of DBSpam based on libpcap, and validate its efficacy through both theoretical analyses
and trace-based experiments.
Appeared in ACM CCS 2006, Alexandria, VA, November 2006.
- Identifying Low-Profile Web Server's IP Fingerprint
Mengjun Xie, Keywan Tabatabai, and Haining Wang
With the immense success of World Wide Web, Web servers have become ubiquitous for all kinds of organizations,
even for individuals. While most previous research has been conducted on high-profile Web servers, the majority
of Web servers on the Internet are low-profile. In this paper, we focus on the low-profile Web servers inside
a middle-sized campus network. We collect eight-month traces on ten departmental Web servers and investigate
the dynamics of IP addresses of their remote clients. After analyzing accesses of remote clients to the
monitored servers, we find that (1) the pool of 32-bit IP addresses seen by a server rarely converges to a
stable set, i.e., there are always a large portion of unseen 32-bit IP addresses sighted in each weekly trace;
(2) however, the group of frequent visitors to a server is relatively stable, and a simple clustering by 24-bit
IP prefix further confirms this observation; (3) although the portion of frequent visitors is small, the volume
of requests they issue dominates in total; (4) last but not least, each Web server has its own group of loyal
clients excluding Web crawlers. We call such a relatively stable and unique pool of "loyal" clients for each
low-profile Web server its IP fingerprint.
Appeared in IEEE QEST 2006, Riverside, CA, September 2006.