Wireless Sensor Systems

Wireless sensor systems have significant potential for aiding scientific studies by instrumenting the real world and collecting measurements, with the aim of observing, detecting, and tracking scientific phenomena that were previous only partially (or not at all) observable or understood. Such technology can facilitate environmental protection, public health, and prevention of economic losses and life. However, one obstacle to achieving the full potential of such systems, is the ability to process, in a timely and meaningfulmanner, the huge amount of measurements they collect. Given such large volumes of data, one natural question might be: Can we devise an efficient automated approach to identifying the ``interesting'' parts of these data sets?. We can view identification of such ``interesting'' or ``unexpected'' measurements (or events) in collected data as anomaly detection (we use the generic term ``anomaly'' for all interesting (typically, other-than-normal) events occurring either on the measured phenomena or the measuring equipment). Automated online (or real-time) anomaly detection in measurements collected by sensor systems has been the topic of my recent work.

Our initial efforts [D14,A3] focused on what can be viewed as a special case of anomaly detection, namely fault detection. Sensor network measurement studies have reported instances of transient faults in sensor readings; thus, we sought to answer a simple question: How often are such faults observed in real deployments? We explored and characterized four qualitatively different classes of fault detection methods: (1) rule-based methods that leverage domain knowledge to develop heuristic rules for detecting and identifying faults; (2) estimation methods that predict ``normal'' sensor behavior by leveraging sensor correlations, flagging anomalous sensor readings as faults; (3) time series analysis based methods that start with an a priori model for sensor readings and compare a sensor measurement against its predicted value to determine if it is faulty; and (4) learning-based methods that infer a model for the ``normal'' sensor readings using training data, and then statistically detect and identify classes of faults. We found that these classes of methods sit at different points on the accuracy/robustness spectrum. Rule-based methods can be highly accurate, but their accuracy depends critically on the choice of parameters. Learning methods can be cumbersome to train, but can accurately detect and classify faults. Estimation methods are accurate, but cannot classify faults. Time series analysis based methods are more effective for detecting short duration faults than long duration ones, and incur more false positives than the other methods. We applied these techniques to real-world sensor data sets and found that all methods were qualitatively consistent in identifying sensor faults and that the prevalence of faults and their type varied with data sets. This work was an initial step towards automated on-line fault detection and classification.

Recently, we have expanded our efforts tomore general anomaly detection [F1]. A good anomaly detection methodology for sensor systems should be able to accurately identify many types of anomalies, be robust, require relatively little resources, and perform detection in (near) real-time. Hence, we focused on an approach to online anomaly detection in measurements collected by sensor systems that exhibits these characteristics. Our goal was to detect anomalies without ``pre-defining'' what is an anomaly. Briefly, our approach leverages temporal and spatial correlations in sensor systems measurements and constructs (what is essentially) a piecewise linear model of sensor data time series. To detect anomalies, we compare the piecewise linear models of sensor data (collected during a time interval) and a reference model (also computed online), where significant differences (as determined by a similarity metric) are flagged as anomalies. Our extensive evaluation study, using real sensor system deployments data, illustrates that our approach is accurate, robust, and efficient. We believe this to be a promising direction as the results also indicate that our online approach is more accurate than some of the (more computationally intensive) offline techniques.

In the context of (multi-hop) sensor networks, we have also studied energy efficient data compression and transmission. In such an environment, data compression is advantageous only when the transmission energy saved (from having to transmit less data) is greater than the energy expended in compressing the data. Thus, the aim of our work [D5] was to design a joint compression, transmission scheduling and transmission power control policy that minimizes the time average energy expenditure while achieving network stability. Our algorithm can achieve time average power expenditure arbitrarily close to the optimal, while adapting to topology, network dynamics, application data rates, and platform differences.

[Last updated Mon Mar 29 2010]