Abstract

Breast cancer is one of the most common malignant cancers among females worldwide. This complex disease is not caused by a single gene, but resulted from multi-gene interactions, which could be represented by biological networks. Network modules are composed of genes with significant similarities in terms of expression, function and disease association. Therefore, the identification of disease risk modules could contribute to understanding the molecular mechanisms underlying breast cancer. In this paper, an integrated disease risk module identification strategy was proposed according to a multi-objective programming model for two similarity criteria as well as significance of permutation tests in Markov random field module score, function consistency score and Pearson correlation coefficient difference score. Three breast cancer risk modules were identified from a breast cancer-related interaction network. Genes in these risk modules were confirmed to play critical roles in breast cancer by literature review. These risk modules were enriched in breast cancer-related pathways or functions and could distinguish between breast tumor and normal samples with high accuracy for not only the microarray dataset used for breast cancer risk module identification, but also another two independent datasets. Our integrated strategy could be extended to other complex diseases to identify their risk modules and reveal their pathogenesis.