Monitoring Testbed Experiments with MonEx


Abdulqawi Saif, Alexandre Merlin, Lucas Nussbaum and Ye-Qiong Song

Abstract [paper]

Almost all testbed experiments deal with different kinds of metrics which are collected from and/or about various kinds of resources. Despite the importance of collecting experiment metrics in the experiment life cycle, this phase is often done via ad hoc, manual, and artisanal actions such as manually combining multiple scripts, or manipulating some missing values. A few tools (e.g. Vendetta, OML) can be used for monitoring experiments. However their work is restricted to communicating metrics towards a central server, and they do not cover different features from user perspective such as drawing and archiving experiments results.

In this talk, we will firstly discuss the requirements of experiment monitoring. Having a well-defined set of requirements eliminates the potential ambiguity around what should be targeted by any Experiment Monitoring Framework (EMF). The defined requirements are not testbed dependent nor technology-dependent, so any testbed community can build their own EMF by implementing these requirements, using different software systems.

We will then describe our own proposition, MonEx (for long: Monitoring Experiments), which is an EMF that satisfies all the defined requirements. MonEx is built over several off-the-shelf infrastructure monitoring tools, and supports various monitoring approaches such as pull- and push-based monitoring, agent-based and agent-less monitoring. MonEx covers all the required steps of monitoring experiments from collecting metrics to archiving experiments data and producing figures.
We will then demonstrate MonEx’s usability through a set of experiments performed on the Grid’5000 testbed and being monitored by MonEx. Each of those experiments have different requirements, and as a group they show how MonEx meets all defined requirements. We show how MonEx nicely integrates the experimental workflow and how it simplifies the monitoring task, reducing the efforts of users during experimentation and pushing towards the repeatability of experiments’ analysis and metrics comparison.