大理大学学报 ›› 2019, Vol. 4 ›› Issue (12): 1-5.

• 数学与计算机科学 •    下一篇

基于加权期望语义距离的不确定分类数据异常点检测

  

  1. 大理大学数学与计算机学院,云南大理671003
  • 收稿日期:2018-10-18 出版日期:2019-12-15 发布日期:2019-12-15
  • 作者简介:赵秦怡,副教授,主要从事空间数据挖掘算法研究.

Weighted Expected Semantic Distance Based on Outlier Detection of Uncertain Classification Data

  1. College of Mathematics and Computer, Dali University, Dali, Yunnan 671003, China
  • Received:2018-10-18 Online:2019-12-15 Published:2019-12-15

摘要: 对不确定数据进行异常点检测能从不确定数据集中检测出与大部分对象不同的对象。用期望语义距离度量对象之间
的距离,并提出加权期望语义距离计算方法,通过属性加权充分体现属性在期望语义距离度量中的贡献度不同,从而提高异常
点检测结果的应用驱动性和有效性。算法在分类数据集中进行异常点检测,可以避免通常的异常点检测方法在检测时未考虑
数据库中对象之间的差异性而导致检测结果的不准确。实验结果表明,分类数据中的加权期望语义距离异常点检测方法克服
了传统距离度量在异常点检测算法中的缺陷,优化了算法的性能。

关键词: 异常点检测, 加权期望语义距离, 不确定数据

Abstract: Outlier detection of uncertain data can detect objects that are different from most objects from an indeterminate data set.
The distance between objects is measured by the expected semantic distance, and the weighted expectation semantic distance
calculation method is proposed. The attribute weighting fully reflects the different contribution of the attribute in the expected semantic
distance metric, thus improving the application- driven and effective detection of abnormal point detection results. The algorithm
performs abnormal point detection in the classified data set, which can avoid the inaccuracy of the detection result when the normal
abnormal point detection method does not consider the difference between the objects in the database. The experimental results show
that the weighted expected semantic distance anomaly detection method in the classification data overcomes the shortcomings of the
traditional distance metric in the anomaly detection algorithm and optimizes the performance of the algorithm.

Key words: outlier detection, weighted expected semantic distance, uncertain data