Abstract (EN):
In many organizations with a distributed operation, not only is data collection distributed, but models are also developed and deployed separately. Understanding the combined knowledge of all the local models may be important and challenging, especially in the case of a large number of models. The automated development of consensus models, which aggregate multiple models into a single one, involves several challenges, including fidelity (ensuring that aggregation does not penalize the predictive performance severely) and completeness (ensuring that the consensus model covers the same space as the local models). In this paper, we address the latter, proposing two measures for geometrical and distributional completeness. The first quantifies the proportion of the decision space that is covered by a model, while the second takes into account the concentration of the data that is covered by the model. The use of these measures is illustrated in a real-world example of academic management, as well as four publicly available datasets. The results indicate that distributional completeness in the deployed models is consistently higher than geometrical completeness. Although consensus models tend to be geometrically incomplete, distributional completeness reveals that they cover the regions of the decision space with a higher concentration of data. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
Language:
English
Type (Professor's evaluation):
Scientific
No. of pages:
14