›› 2017, Vol. 2 ›› Issue (12): 16-20.

• 数学与计算机科学 • 上一篇    下一篇

基于期望语义距离的不确定k近邻分类方法

  

  1. (大理大学数学与计算机学院,云南大理671003)
  • 收稿日期:2017-05-12 出版日期:2017-12-15 发布日期:2017-12-15
  • 作者简介:赵秦怡,副教授,主要从事数据挖掘研究.

The Classification of K-Nearest Neighbor over Uncertain Data Based on Expected Semantic Distance

  1. (College of Mathematics and Computer, Dali University, Dali, Yunnan 671003, China)
  • Received:2017-05-12 Online:2017-12-15 Published:2017-12-15

摘要: 不确定性数据主要分为元组存在不确定和属性值不确定两种,针对属性值不确定提出了一种k近邻分类算法。算法中对象属性是离散型的,其值的不确定性用概率分布向量描述。根据概念层次树计算属性分量值间的语义距离,进而计算属性及对象间的期望语义距离。对算法分类准确率进行了实验验证,实验结果表明这是一个分类准确率高的基于不确定数据分类挖掘算法。

关键词: 分类, k近邻分类, 不确定数据, 期望语义距离

Abstract: The uncertainty of data mainly includes the uncertainty of tuples and that of attribute values. For the latter type, a knearest neighbor classifier is proposed. The attribute value in this classifier is discrete, and the uncertainty of it is expressed by probability distribution vector. The semantic distance among probability distributions is firstly computed according to Concept Hierarchy Tree, and then the semantic distances among attributes and objects are computed. The classification accuracy rate has been validated by experimentation, which indicates that this classifier is a highly effective algorithm for uncertain data.

Key words: classification, KNN classifier, uncertain data, expected semantic distance

中图分类号: