Quantifying Similarity to MSI-H: A Machine Learning-Based Scoring System to Identify MSI-H-like MSS Colorectal Cancers

Hongkai Yan, MD. #, Li Jiang, Dr.rer.nat. #, Yaqi Li, MD.PhD., Fengchong Wang, MSc., Weiqi Sheng, MD.PhD., Dan Huang, MD.PhD.*, Junjie Peng, MD.PhD.*

Microsatellite stability (MSS) colorectal cancers (CRCs) have a limited response to immune checkpoint inhibitors (ICIs) compared to microsatellite instability-high (MSI-H) CRCs. Nevertheless, previous studies have shown that some MSS CRCs are sensitive to immune checkpoint inhibitors (ICIs), although established criteria for treatment justification are still lacking. To address this gap, we aimed to develop a novel computational tool for the similarity prediction between MSS and MSI-H status in CRC patients based on multiple factors. Data from 188 CRC patients, including MSI status, immune cell distributions, clinical features, and gene mutations, were collected and analysed using statistical methods and Cox regression. An ensemble machine learning-based MSI-H score was developed using stacked XGBoost classifiers to quantify the similarity of patient data to MSI-H data based on immune cell distributions, clinical features, and gene mutations. The model is robust and can address missing input data for immune cell distributions and gene mutations. The scorer revealed that some MSS CRC patients presented similar characteristics compared to MSI-H patients. The disparity between MSI-H-like MSS CRCs and MSS CRCs primarily lies in the Treg and macrophage populations within the tumour stromal region. A hypothesis to explain this phenomenon was proposed based on the results and other literature. The scorer has been deployed online as an open web user interface. This work presents a promising avenue for more personalized and effective cancer immunotherapy treatment, offering a clinical reference for potential ICI targets in MSS CRCs.

# co-first authors

* corresponding authors