Abstract (EN):
Building a Machine Learning model requires the use of large amounts of data. Due to privacy and regulatory concerns, these data might be owned by multiple sites and are often not mutually shareable. Our work deals with private learning and inference for the Weighted Random Forest model when data records are vertically distributed among multiple sites. Previous privacy-preserving vertical tree-based frameworks either adapt Secure Multi-party Computation or share intermediate results and are hard to generalize or scale. In contrast, our proposal contains efficient collaborative calculation algorithms of the Gini Index and Entropy for computing the impurity of decision tree nodes while protecting all intermediate values and disclosing minimal information. We offer a learning protocol based on the Paillier Cryptosystem and Digital Envelope. Also, we provide an inference protocol found on the Look-up Table. Our experiments show that the proposed protocols do not cause predictive performance loss while still establishing and utilizing the model within a reasonable time. The results imply that practitioners can overcome the barrier of data sharing and produce random forest models for data-heavy domains with strict privacy requirements, such as Health Prediction, Fraud Detection, and Risk Evaluation. © 2022 IEEE.
Language:
English
Type (Professor's evaluation):
Scientific
No. of pages:
10