Wireless sensor systems have significant potential for aiding scientific studies by instrumenting the
real world and collecting measurements, with the aim of observing, detecting, and tracking scientific
phenomena that were previous only partially (or not at all) observable or understood. Such technology
can facilitate environmental protection, public health, and prevention of economic losses and
life. However, one obstacle to achieving the full potential of such systems, is the ability to process,
in a timely and meaningfulmanner, the huge amount of measurements they collect. Given such large
volumes of data, one natural question might be: Can we devise an efficient automated approach to
identifying the ``interesting'' parts of these data sets?. We can view identification of such ``interesting''
or ``unexpected'' measurements (or events) in collected data as anomaly detection (we use the
generic term ``anomaly'' for all interesting (typically, other-than-normal) events occurring either on
the measured phenomena or the measuring equipment). Automated online (or real-time) anomaly
detection in measurements collected by sensor systems has been the topic of my recent work.
Our initial efforts [D14,A3]
focused on what can be viewed as a special case of anomaly detection,
namely fault detection. Sensor network measurement studies have reported instances of
transient faults in sensor readings; thus, we sought to answer a simple question: How often are
such faults observed in real deployments? We explored and characterized four qualitatively different
classes of fault detection methods: (1) rule-based methods that leverage domain knowledge
to develop heuristic rules for detecting and identifying faults; (2) estimation methods that predict
``normal'' sensor behavior by leveraging sensor correlations, flagging anomalous sensor readings as
faults; (3) time series analysis based methods that start with an a priori model for sensor readings
and compare a sensor measurement against its predicted value to determine if it is faulty; and (4)
learning-based methods that infer a model for the ``normal'' sensor readings using training data, and
then statistically detect and identify classes of faults. We found that these classes of methods sit at
different points on the accuracy/robustness spectrum. Rule-based methods can be highly accurate,
but their accuracy depends critically on the choice of parameters. Learning methods can be cumbersome
to train, but can accurately detect and classify faults. Estimation methods are accurate, but
cannot classify faults. Time series analysis based methods are more effective for detecting short
duration faults than long duration ones, and incur more false positives than the other methods. We
applied these techniques to real-world sensor data sets and found that all methods were qualitatively
consistent in identifying sensor faults and that the prevalence of faults and their type varied with
data sets. This work was an initial step towards automated on-line fault detection and classification.
Recently, we have expanded our efforts tomore general anomaly detection [F1]. A good anomaly
detection methodology for sensor systems should be able to accurately identify many types of
anomalies, be robust, require relatively little resources, and perform detection in (near) real-time.
Hence, we focused on an approach to online anomaly detection in measurements collected by sensor
systems that exhibits these characteristics. Our goal was to detect anomalies without ``pre-defining''
what is an anomaly. Briefly, our approach leverages temporal and spatial correlations in sensor systems
measurements and constructs (what is essentially) a piecewise linear model of sensor data time
series. To detect anomalies, we compare the piecewise linear models of sensor data (collected during
a time interval) and a reference model (also computed online), where significant differences (as
determined by a similarity metric) are flagged as anomalies. Our extensive evaluation study, using
real sensor system deployments data, illustrates that our approach is accurate, robust, and efficient.
We believe this to be a promising direction as the results also indicate that our online approach is
more accurate than some of the (more computationally intensive) offline techniques.
In the context of (multi-hop) sensor networks, we have also studied energy efficient data compression
and transmission. In such an environment, data compression is advantageous only when the
transmission energy saved (from having to transmit less data) is greater than the energy expended
in compressing the data.
Thus, the aim of our work [D5] was to design a joint compression, transmission
scheduling and transmission power control policy that minimizes the time average energy
expenditure while achieving network stability. Our algorithm can achieve time average power expenditure
arbitrarily close to the optimal, while adapting to topology, network dynamics, application
data rates, and platform differences.
|