西南石油大学学报(自然科学版) ›› 2020, Vol. 42 ›› Issue (6): 157-164.DOI: 10.11885/j.issn.1674-5086.2020.05.12.06

• 油气田人工智能技术与应用专刊 • 上一篇    下一篇

基于Tri-BiLSTM-CNN的钻井安全问答系统

王兵1, 郑亚梅1, 陈茂柯2, 高凌云2   

  1. 1. 西南石油大学计算机科学学院, 四川 成都 610500;
    2. 中国石油集团测井有限公司西南分公司, 重庆 渝北 401120
  • 收稿日期:2020-05-12 发布日期:2020-12-21
  • 通讯作者: 王兵,E-mail:w9521423@sina.com
  • 作者简介:王兵,1977年生,男,汉族,四川南充人,副教授,主要从事钻井安全评价、数据分析与挖掘等方面的研究。E-mail:w9521423@sina.com;郑亚梅,1994年生,女,汉族,四川眉山人,硕士研究生,主要从事深度学习与自然语言处理方面的研究。E-mail:1005094519@qq.com;陈茂柯,1974年生,男,汉族,四川隆昌人,主要从事测井安全生产管理方面的研究。E-mail:8892271@qq.com;高凌云,1984年生,男,汉族,四川泸州人,主要从事井场信息化及钻井仪器仪表的研究。E-mail:403815459@qq.com
  • 基金资助:
    国家科技重大专项(2016ZX05020-006)

Question Answering System for Drilling Safety Based on Tri-BiLSTM-CNN

WANG Bing1, ZHENG Yamei1, CHEN Maoke2, GAO Lingyun2   

  1. 1. School of Computer Science, Southwest Petroleum University, Chengdu, Sichuan 610500, China;
    2. Southwest Branch of China Petroleum Logging Co. Ltd., Yubei, Chongqing 401120, China
  • Received:2020-05-12 Published:2020-12-21

摘要: 特定领域的FAQ问答系统通常存在以下3个问题:(1)如何有效地对句子进行语义表示;(2)如何有效地进行句子间的语义匹配;(3)领域词汇的分词问题。为解决上述3个问题,提出一种基于Tri-BiLSTM-CNN的深度学习模型。首先,将双向长短期记忆网络和卷积神经网络结合构建网络模型,综合利用了BiLSTM处理序列化数据的优势和CNN捕获局部特征的优势。然后,采用Triplet并列式排列结构进行句子之间的匹配。最后,使用字向量替代词向量,避免了分词错误对模型的影响。在钻井安全领域的真实数据集上进行实验验证,结果表明,Tri-BiLSTM-CNN模型能更好地对句子语义进行向量化表征,显著提升句子相似度计算的准确率,而且效果明显优于CNN和LSTM两种网络结构。将该模型用于钻井安全领域的FAQ问答系统中,有效减少了人工成本,对改善钻井工作的效率和质量具有重要意义和应用价值。

关键词: 钻井安全, 问答系统, 双向长短期记忆网络, 卷积神经网络, 句子相似度计算

Abstract: The FAQ question answering system in a specific field usually has the following three problems:(1) how to effectively represent sentences semantically; (2) how to effectively match sentences semantically; (3) how to segment domain words. To solve the above three problems, a deep learning model based on Triplet BiLSTM-CNN is proposed. Firstly, the bidirectional long-term memory network and convolutional neural network are combined to construct the network model, which makes full use of the advantages of BiSLTM in processing the serialized data and the advantages of CNN in capturing local features. Then, the Triplet parallel structure is used to match sentences. Finally, character vector is used instead of word vector to avoid the influence of segmentation error on the model. The experimental results on real data sets in the field of drilling safety show that Triplet BiLSTM-CNN model can better vectorize sentence semantics and significantly improve the accuracy of sentence similarity calculation, and the effect is significantly better than that of CNN and LSTM. The model is applied to the FAQ question answering system in the field of drilling safety, which can effectively reduce the labor cost, and is of great significance and application value to improve the efficiency and quality of drilling work.

Key words: drilling safety, question answering system, bidirectional long short term memory, convolution neural network, question similarity computation

中图分类号: