J4 ›› 2015, Vol. 14 ›› Issue (6): 1-4.

• 数学与计算机科学 •    下一篇

基于遍历基因组合的特征基因选取方法

  

  1. 大理学院数学与计算机学院,云南大理671003
  • 收稿日期:2015-03-13 出版日期:2015-06-15 发布日期:2015-06-15
  • 作者简介:李杰,助教,主要从事缺失数据、变量选择、数据降维及大数据分析研究.
  • 基金资助:

    云南省科技厅青年基金资助项目(2013FD037);大理学院青年教师科研基金资助项目(KYQN201219)

A Feature Gene Selection Method Based on Traversal Gene Combination

  1. College of Mathematics and Computer, Dali University, Dali, Yunnan 671003, China
  • Received:2015-03-13 Online:2015-06-15 Published:2015-06-15

摘要:

特征基因的选取是非常热门的问题,在癌症是由某个或者某几个基因共同相互作用引起变异的假设下,从最简单的2
个基因组合进行研究,遍历所有可能的基因组合,运用Logistic回归分类器,以预测精度和AIC准则为评价标准,对所有的模拟
结果进行评价,得到最优基因组合(X55187,D14812)。同时运用交叉留一检验,验证了此基因组合建立模型的稳定性。最后
又对预测精度大于90%的640对基因组合进行频数分析,并与已有文献进行比较,得到出现频率高的基因组合,预测精度并不
一定高的结论。

关键词: 结肠癌, 特征基因, Logistic回归, 遍历

Abstract:

Feature gene selection is a hot issue. Under the consumption that cancer is caused by one or some genes heteromorphosis,
this article starts research from two genes combination, using Logistic regression method on all possible combinations with 2 genes with
prediction accuracy and AIC as the evaluation criteria. Based on the evaluation of all stimulation results, we obtained the best gene
combination(X55187, D14812). Meanwhile, we tested and verified the stability of this best combination(X55187, D14812)with
leave one out cross validation. At last, we analyzed and compared the frequencies of 640 pairs gene combination which prediction
accuracy was more than 90% with former studies. The result shows that prediction accuracy is not high along with the higher frequency
gene combination.

Key words: colon cancer, feature genes, Logistic regression, traversal

中图分类号: