Early prediction of reliability is important in building dependable software systems, as problems
discovered in later stages (e.g., after implementation), can be quite costly to address. Thus, it is important
to estimate reliability at the software architecture level. A number of efforts (prior to ours)
focused on system-level reliability prediction, essentially assuming that reliabilities of components
(making up the system) are known. Our work [D11] in this area strived to remedy the shortcomings
of previous approaches by proposing a framework for predicting reliability of software components
during architectural design. Important challenges in component reliability prediction at architectural
design time are due to uncertainties and lack of information in the early stages of design and
development. Our work focused on dealing with two such uncertainties: (1) lack of a component's
operational profile (since it has not been implemented yet) and (2) its failure information (since systems
are designed for correct behavior, failure modes are not typically part of an architectural specification).
Our evaluation and validation efforts (as compared to implementation-level techniques)
indicated that our framework can meaningfully assess a component's reliability.
Predicting a component's reliability was an important first step to system reliability prediction,
on which we focused next. Our work on system-level reliability
[D2] is distinct as it focused on (a)
predicting reliability of concurrent software systems in a scalable yet accurate manner, while (b)
taking contention between software resources into account. The improved scalability is achieved
through a hierarchical approach. Most existing techniques assume a sequential system. In predicting
reliability of concurrent systems, one typically needs to keep track of the status of all components.
Often, such approaches essentially generate reliability models in a brute-force manner and
thus suffer from scalability (``state explosion'') problems, making them often prohibitively costly
(for generating and solving models). To this end, we proposed a hierarchical reliability prediction
framework, where we approached the scalability problem by generating and solving submodels that
each capture a part of the system's functionality. In doing so, we leveraged system use-case scenario
models (e.g., such as UML sequence diagrams). Specifically, we solved finer granularity scenariobased
submodels and a coarser granularity system model; we then combined their results to obtain
a system reliability estimate. The motivation here is that the submodels are expected to be relatively
small and that solving a number of smaller submodels (rather than one huge model) results in space
and computational savings.
We were able to achieve significantly better scalability without sacrificing the level of detail
modeled. That is, some existing techniques model a component as being either on or off; doing
so gives information about which component failed but not about what causes the failure. By using
a hierarchical approach, we are able to retain significant level of detail about the system being
modeled, while modeling it in a scalable manner. Moreover, our approach also allows for modeling
of contention between software components. (In a concurrent system, components share resources,
e.g., services provided by other software components.) We validated the accuracy of our approach
and illustrated that in practice significant space and computational benefits are obtained. Our evaluation
also indicated that it is important to model contention in predicting a concurrent system's
reliability, as the resulting accuracy is improved.
|