Abstract

Ulcerative colitis is a type of inflammatory bowel disease characterized by chronic and recurrent nonspecific inflammation of the intestinal tract. To find susceptibility genes and develop a novel predictive model of ulcerative colitis, two sets of cases and a control group containing the ulcerative colitis gene expression profile (training set GSE109142 and validation set GSE92415) were downloaded and used to identify differentially expressed genes. A total of 781 upregulated and 127 downregulated differentially expressed genes were identified in GSE109142. The random forest algorithm was introduced to determine 1 downregulated and 29 upregulated differentially expressed genes contributing highest to ulcerative colitis occurrence. Expression data of these 30 genes were transformed into gene expression scores, and an artificial neural network model was developed to calculate differentially expressed genes weights to ulcerative colitis. We established a universal molecular prognostic score (mPS) based on the expression data of the 30 genes and verified the mPS system with GSE92415. Prediction results agreed with that of an independent data set (ROC-AUC=0.9506/PR-AUC=0.9747). Our research creates a reliable predictive model for the diagnosis of ulcerative colitis, and provides an alternative marker panel for further research in disease early screening