Estimating Reliability of Software Architectures

Early prediction of reliability is important in building dependable software systems, as problems discovered in later stages (e.g., after implementation), can be quite costly to address. Thus, it is important to estimate reliability at the software architecture level. A number of efforts (prior to ours) focused on system-level reliability prediction, essentially assuming that reliabilities of components (making up the system) are known. Our work [D11] in this area strived to remedy the shortcomings of previous approaches by proposing a framework for predicting reliability of software components during architectural design. Important challenges in component reliability prediction at architectural design time are due to uncertainties and lack of information in the early stages of design and development. Our work focused on dealing with two such uncertainties: (1) lack of a component's operational profile (since it has not been implemented yet) and (2) its failure information (since systems are designed for correct behavior, failure modes are not typically part of an architectural specification). Our evaluation and validation efforts (as compared to implementation-level techniques) indicated that our framework can meaningfully assess a component's reliability.

Predicting a component's reliability was an important first step to system reliability prediction, on which we focused next. Our work on system-level reliability [D2] is distinct as it focused on (a) predicting reliability of concurrent software systems in a scalable yet accurate manner, while (b) taking contention between software resources into account. The improved scalability is achieved through a hierarchical approach. Most existing techniques assume a sequential system. In predicting reliability of concurrent systems, one typically needs to keep track of the status of all components. Often, such approaches essentially generate reliability models in a brute-force manner and thus suffer from scalability (``state explosion'') problems, making them often prohibitively costly (for generating and solving models). To this end, we proposed a hierarchical reliability prediction framework, where we approached the scalability problem by generating and solving submodels that each capture a part of the system's functionality. In doing so, we leveraged system use-case scenario models (e.g., such as UML sequence diagrams). Specifically, we solved finer granularity scenariobased submodels and a coarser granularity system model; we then combined their results to obtain a system reliability estimate. The motivation here is that the submodels are expected to be relatively small and that solving a number of smaller submodels (rather than one huge model) results in space and computational savings.

We were able to achieve significantly better scalability without sacrificing the level of detail modeled. That is, some existing techniques model a component as being either on or off; doing so gives information about which component failed but not about what causes the failure. By using a hierarchical approach, we are able to retain significant level of detail about the system being modeled, while modeling it in a scalable manner. Moreover, our approach also allows for modeling of contention between software components. (In a concurrent system, components share resources, e.g., services provided by other software components.) We validated the accuracy of our approach and illustrated that in practice significant space and computational benefits are obtained. Our evaluation also indicated that it is important to model contention in predicting a concurrent system's reliability, as the resulting accuracy is improved.

[Last updated Mon Mar 29 2010]