在线评论中的用户需求识别及其演化趋势挖掘

王克勤; 高智姣; 乔亚楠; 李靖; 同淑荣

doi:10.13433/j.cnki.1003-8728.20230241

在线评论中的用户需求识别及其演化趋势挖掘

doi: 10.13433/j.cnki.1003-8728.20230241

西北工业大学管理学院, 西安 710072

基金项目:

国家自然科学基金青年项目 72101204

陕西省自然科学基金项目 2022JM-421

详细信息

作者简介:
王克勤(1979-), 副教授, 博士, 研究方向为质量管理和产品开发管理, keqinwang@nwpu.edu.cn

通讯作者:
李靖, 副教授, 博士, lijing2015@nwpu.edu.cn

中图分类号: F274;TP391.1
计量
- 文章访问数: 124
- HTML全文浏览量: 244
- PDF下载量: 17
- 被引次数: 0
出版历程
- 收稿日期: 2023-02-18
- 刊出日期: 2023-07-25

User Requirements Identification and Trend of Evolution Mining in Online Reviews

School of Management, Northwestern Polytechnical University, Xi′an 710072, China

摘要

摘要: Web2.0环境下，越来越多的消费者在网络平台上购买商品，且将使用感受通过在线评论的形式表现出来，大量的在线评论数据蕴含着很多有价值的信息，企业可以利用在线评论来识别和分析用户需求，以便于后续的产品改进。本文以联想笔记本电脑的评论数据为研究对象，提出基于在线评论挖掘的用户需求识别与演化分析模型，利用SnowNLP模型、Kano模型与LDA模型，对用户评论进行分类、识别、特征情感对分析以及时间序列分析。结果表明: 根据情感趋势预测，顾客对类型一、类型二和类型三的情感值呈上升趋势，类型四的情感值呈下降趋势；此外，用户对产品外观与游戏体验的关注较多。研究从时间的角度对在线评论的研究方法和模型进行了改进，可为分析用户对产品需求以及预测用户对于产品的情感趋势等研究提供参考价值。
- 在线评论挖掘 /
- 用户需求 /
- 情感分析 /
- LDA模型 /
- Kano模型
Abstract: In the Web2.0 environment, more and more consumers are purchasing products on online platforms and expressing their feelings through online reviews. A large amount of online review data contains valuable information, and enterprises can use online reviews to identify and analyze user requirements for subsequent product improvement. This article takes review data from Lenovo laptops as the research object and proposes a user requirements identification and evolution analysis model based on online reviews mining. The SnowNLP model, Kano model, and LDA model are used to classify, identify, analyze feature sentiment pairs, and analyze time series of user reviews. The results show that according to the sentimental trend prediction, customers′ sentiment values for type 1, type 2, and type 3 show an upward trend, while the sentiment values for type 4 show a downward trend; In addition, users pay more attention to the appearance of the product and the gaming experience. The research has improved the research methods and models of online reviews from a time perspective, providing reference value for analyzing user requirements for products and predicting user sentimental trends towards products.
- online reviews mining /
- user requirements /
- sentiment analysis /
- LDA model /
- Kano model

HTML全文

图 1 LDA模型最佳主题数寻优

Figure 1. LDA model optimal topic number optimization

下载: 全尺寸图片幻灯片

图 2 基本型需求LDA主题分析可视化

Figure 2. Basic requirements LDA topic analysis visualization

下载: 全尺寸图片幻灯片

图 3 类型一产品时间序列的4种方法拟合图

Figure 3. Four methods of fitting a graph for type 1 product time series

下载: 全尺寸图片幻灯片

图 4 产品情感均值随时间分布以及指数平滑预测结果

Figure 4. Product sentiment mean over time and exponential smoothing prediction results

下载: 全尺寸图片幻灯片

图 5 R7000P型号1产品各时间区间内的词云图

Figure 5. R7000P Model 1 word cloud map in each time interval of the product

下载: 全尺寸图片幻灯片

图 6 用户需求演化过程

Figure 6. Evolution process of user requirements

下载: 全尺寸图片幻灯片

表 1 在线评论数据数量

Table 1. Number of online review data

笔记本电脑型号	原始评论数量	文本去重后的评论数量	机械压缩去词和去短句后的评论数量
R7000P型号1(类型一)	2 345	1 828	1 798
R7000P型号2(类型二)	2 492	1 598	1 589
R9000P型号1(类型三)	2 865	2 292	2 273
R9000P型号2(类型四)	1 434	1 203	1 177
合计	9 136	6 921	6 789

下载: 导出CSV

表 2 Kano模型与在线评论的关系

Table 2. The relationship between Kano model and online reviews

Kano模型需求类型	用户满意度变化	在线评论的表征
基本型需求	提供时, 用户满意度提升不大; 不提供时, 用户满意度大幅降低	非常关注。该类需求的实现, 不会激起用户在网上评论及分享, 然而若该类需求未实现, 用户会在网上表达强烈的负向情感, 同时会吸引更多的用户关注。
期望型需求	提供时, 用户满意度会提升; 不提供时, 用户满意度会降低	非常关注。对该类需求的评价是用户反馈的主要组成部分, 但是该类需求未实现时, 其感情程度不及基本型需求未得到满足时强烈。
魅力型需求	提供时, 用户满意度会有很大提升; 不提供时, 用户满意度不会降低	一般关注。用户反馈争议比较大, 该类需求较难把握。
无关需求	无论提供或不提供, 用户满意度都不会有改变	偶尔关注。与产品关系不大或者没有关系的用户评论

下载: 导出CSV

表 3 基于在线评论的需求分类

Table 3. Classification of requirements based on online reviews

情感极性	关注度	需求类型
负向	高	基本型
正向	高	期望型
负向	低	期望型
正向	低	魅力型

下载: 导出CSV

表 4 评论数据分类

Table 4. Review data classification

用户需求类型	R7000P型号1	R7000P型号2	R9000P型号1	R9000P型号2
基本型需求	901	176	140	170
期望型需求	385	350	527	336
魅力型需求	507	1 057	1 587	662

下载: 导出CSV

表 5 R7000P型号1产品各类需求主题分析结果

Table 5. R7000P Model 1 Product requirements subject analysis results

需求	主题	词汇
	价格	电脑联想买垃圾降价慢特别打折
基本型	客服	电脑客服重启死机声音软件买
	鼠标	电脑买一个开机换货会鼠标真的

	外观	白色游戏好看不错外观运行画面电脑
	包装	包装画面游戏好看外形保护键盘运行
期望型	屏幕	屏幕颜值不错性能高固态外观感觉
	客服	买客服内存差评说牌子 office 希望出
	风扇	感觉东西买真的特别风扇不好到货售后

	游戏	游戏包装不错保护外观电脑运行喜欢
魅力型	系统	不错运行速度品质喜欢外观挺高很快
	软件	游戏不错性能好看外观外形包装白色

下载: 导出CSV

表 6 类型一的实际情感均值、1阶指数平滑值、2阶指数平滑值、3阶指数平滑值和自回归预测模型拟合值表

Table 6. The actual affective mean value, first-order exponential smoothing value, second-order exponential smoothing value, third-order exponential smoothing value and autoregressive prediction model fitting value table of type 1

时间	实际情感值	1阶指数平滑	2阶指数平滑	3阶指数平滑	自回归预测模型
2021-01-01	0.705 7	0.717 6	0.717 6	0.717 6	0.693 2
2021-01-02	0.640 7	0.679 1	0.679 1	0.684 1	0.725 0
2021-01-03	0.748 0	0.713 6	0.703 9	0.720 8	0.720 5
2021-01-04	0.782 9	0.748 3	0.744 8	0.756 8	0.629 7
2021-01-05	0.827 8	0.788 0	0.797 3	0.797 4	0.645 0
2021-01-06	0.585 3	0.686 7	0.709 9	0.696 5	0.745 0
2021-01-07	0.179 4	0.433 0	0.432 0	0.442 9	0.748 2
2021-01-08	0.565 5	0.499 3	0.423 1	0.507 0	0.713 9
2021-01-09	0.472 8	0.486 0	0.407 8	0.493 1	0.558 7
2021-01-10	0.334 6	0.410 3	0.347 4	0.417 1	0.595 9
2021-01-11	0.826 7	0.618 5	0.560 0	0.628 3	0.589 2
2021-01-12	0.766 1	0.692 3	0.702 7	0.714 3	0.605 3
2021-01-13	0.467 9	0.580 1	0.640 8	0.593 9	0.656 6
2021-01-14	0.568 0	0.574 0	0.616 6	0.584 9	0.581 5
2021-01-15	0.747 1	0.660 6	0.682 0	0.672 8	0.482 8

下载: 导出CSV

表 7 4种模型拟合数据与实际数据的均方差

Table 7. The mean square error of 4 models fitted data to the actual data

	一次指数平滑	二次指数平滑	三次指数平滑	自回归模型
MSE	0.011 6	0.008 0	0.007 8	0.024 0

下载: 导出CSV

表 8 R7000P型号1产品各时间区间内的高频词统计

Table 8. R7000P Model 1 product statistics of high-frequency words in each time interval

时间区间	高频词	累计高频词
2020-09	风扇, 神机, 发货, 屏幕, 系统, 交流, 评价, 内存	-
2020-10	电流, 感觉, 视频, 价格, 降价, 价保, 气泡, 平面	-
2020-11	游戏, 外形, 外观, 软件, 屏幕, 画面, 风扇, 开机	-
2020-12	外形, 外观, 白色, 品质, 画面, 游戏, 效果, 速度	-
2021-01	感觉, 速度, 质量, 颜值, 白色, 开机, 外形, 外观	外形外观
2021-02	白色, 速度, 屏幕, 售后, 客服, 降价, 感觉, 颜值	白色屏幕速度
2021-03	白色, 速度, 屏幕, 游戏, 感觉, 键盘, 颜值, 客服	游戏感觉颜值
2021-04	速度, 性能, 屏幕, 感觉, 画面, 品质, 外观, 外形	画面
2021-05	游戏, 键盘, 价格, 速度, 屏幕, 办公, 性能, 评价	价格
2021-06	外观, 速度, 开机, 售后, 人工, 客服, 价格, 大师	开机客服
2021-07	摄像头, 开机, 键盘, 神机, 体验, 感觉, 不卡	键盘
2021-08	性价比, 划痕, 程度, 学业	-

下载: 导出CSV

表 9 R7000P型号1产品前3个时间区间的LDA主题分析结果

Table 9. LDA subject analysis results for the first three time intervals of the R7000P Model 1 product

时间	词语	概率	词语	概率	词语	概率	词语	概率	词语	概率
2020-09	快递	0.132	物流	0.131	服务	0.089	速度	0.026	态度	0.026
	散热	0.193	风扇	0.059	品质	0.032	差异	0.032	开机	0.032
	外观	0.157	气泡	0.088	时尚	0.088	漂亮	0.087	声	0.087
2020-10	画面	0.131	外观	0.075	感觉	0.073	价格	0.073	辣鸡	0.073
	速度	0.248	外形	0.130	降价	0.072	游戏	0.072	开机	0.072
	品质	0.069	感觉	0.069	风扇	0.039	野兽	0.038	网络	0.038
2020-11	软件	0.081	系统	0.056	屏幕	0.056	降价	0.056	评价	0.055
	游戏	0.176	外形	0.069	速度	0.048	屏幕	0.028	漏光	0.027
	外观	0.077	画面	0.054	机子	0.054	效果	0.042	开机	0.030

下载: 导出CSV

参考文献(24)

[1]	HONG D, CHIU D K W, SHEN V Y, et al. Ubiquitous enterprise service adaptations based on contextual user behavior[J]. Information Systems Frontiers, 2007, 9(4): 343-358. doi: 10.1007/s10796-007-9039-2
[2]	PANG B, LEE L, VAITHYANATHAN S. Thumbs up? Sentiment classification using machine learning techniques[C]//Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing. Philadelphia, PA, USA: Association for Computational Linguistics, 2002: 79-86.
[3]	HU M Q, LIU B. Mining and summarizing customer reviews[C]//Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Seattle, WA, USA: ACM, 2004: 168-177.
[4]	SHI Y L, PENG Q J. Enhanced customer requirement classification for product design using big data and improved Kano model[J]. Advanced Engineering Informatics, 2021, 49: 101340. doi: 10.1016/j.aei.2021.101340
[5]	YUAN D, ZHOU Y Q, LI R F, et al. Sentiment analysis of microblog combining dictionary and rules[C]//Proceedings of the 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. Beijing, China: IEEE Press, 2014: 785-789.
[6]	吴杰胜, 陆奎. 基于多部情感词典和规则集的中文微博情感分析研究[J]. 计算机应用与软件, 2019, 36(9): 93-99. doi: 10.3969/j.issn.1000-386x.2019.09.017 WU J S, LU K. Chinese Weibo sentiment analysis based on multiple sentiment lexicons and rule sets[J]. Computer Applications and Software, 2019, 36(9): 93-99. (in Chinese) doi: 10.3969/j.issn.1000-386x.2019.09.017
[7]	TABOADA M, BROOKE J, TOFILOSKI M, et al. Lexicon-based methods for sentiment analysis[J]. Computational Linguistics, 2011, 37(2): 267-307. doi: 10.1162/COLI_a_00049
[8]	CAO Q, DUAN W J, GAN Q W. Exploring determinants of voting for the "helpfulness" of online user reviews: a text mining approach[J]. Decision Support Systems, 2011, 50(2): 511-521. doi: 10.1016/j.dss.2010.11.009
[9]	XIANGHUA F, GUO L, YANYAN G, et al. Multi-aspect sentiment analysis for Chinese online social reviews based on topic modeling and HowNet lexicon[J]. Knowledge-Based Systems, 2013, 37: 186-195. doi: 10.1016/j.knosys.2012.08.003
[10]	PERIKOS I, KARDAKIS S, HATZILYGEROUDIS I. Sentiment analysis using novel and interpretable architectures of Hidden Markov Models[J]. Knowledge-Based Systems, 2021, 229: 107332. doi: 10.1016/j.knosys.2021.107332
[11]	ZHAI Z W, LIU B, XU H, et al. Constrained LDA for grouping product features in opinion mining[C]//Proceedings of the 15th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Shenzhen, China: Springer, 2011: 448-459.
[12]	WAWRE S V, DESHMUKH S N. Sentiment classification using machine learning techniques[J]. International Journal of Science and Research, 2016, 5(4): 819-821.
[13]	HOU Z P, CUI F S, MENG Y H, et al. Opinion mining from online travel reviews: a comparative analysis of Chinese major OTAs using semantic association analysis[J]. Tourism Management, 2019, 74: 276-289. doi: 10.1016/j.tourman.2019.03.009
[14]	WANG Y L. Fine-grained opinion mining on Chinese car reviews with conditional random field[J]. Journal of Shanghai Jiaotong University (Science), 2020, 25(3): 325-332. doi: 10.1007/s12204-020-2184-1
[15]	HU M Q, LIU B. Mining and summarizing customer reviews[C]//Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Seattle, WA, USA: ACM, 2004: 168-177.
[16]	KANGALE A, KUMAR S K, NAEEM M A, et al. Mining consumer reviews to generate ratings of different product attributes while producing feature-based review-summary[J]. International Journal of Systems Science, 2016, 47(13): 3272-3286. doi: 10.1080/00207721.2015.1116640
[17]	李慧, 玄洪升. 专利视角下融合多属性的技术创新主题挖掘方法——以芯片领域专利为例[J]. 图书情报工作, 2020, 64(11): 96-107. https://www.cnki.com.cn/Article/CJFDTOTAL-TSQB202011016.htm LI H, XUAN H S. Multi-attribute mining method for technology innovation subject from the perspective of patent-the case of chip patents[J]. Library and Information Service, 2020, 64(11): 96-107. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-TSQB202011016.htm
[18]	唐飞, 王云刚, 杜炳成, 等. 基于优化马尔可夫模型的煤矿事故死亡人数预测[J]. 中国安全科学学报, 2022, 32(4): 122-128. https://www.cnki.com.cn/Article/CJFDTOTAL-ZAQK202204018.htm TANG F, WANG Y G, DU B C, et al. Prediction of death toll in coal mine accidents based on optimized Markov model[J]. China Safety Science Journal, 2022, 32(4): 122-128. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-ZAQK202204018.htm
[19]	余本功, 范招娣. 面向自然语言处理的条件随机场模型研究综述[J]. 信息资源管理学, 2020, 10(5): 96-111. https://www.cnki.com.cn/Article/CJFDTOTAL-XNZY202005012.htm YU B G, FAN Z D. A review of conditional random field models for natural language processing[J]. Journal of Information Resources Management, 2020, 10(5): 96-111. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-XNZY202005012.htm
[20]	YANG L, LIU B, LIN H F, et al. Combining local and global information for product feature extraction in opinion documents[J]. Information Processing Letters, 2016, 116(10): 623-627. doi: 10.1016/j.ipl.2016.04.009
[21]	QIU G A, LIU B, BU J J, et al. Expanding domain sentiment lexicon through double propagation[C]//Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI-09). Pasadena, California, USA: Morgan Kaufmann Publishers Inc, 2009: 1199-1240.
[22]	BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation[J]. The Journal of Machine Learning Research, 2003, 3: 993-1022.
[23]	岳丽欣, 刘自强, 刘春江, 等. 融合引用和文本特征的技术创新路径识别研究[J]. 图书情报工作, 2023, 67(3): 49-60. https://www.cnki.com.cn/Article/CJFDTOTAL-TSQB202303005.htm YUE L X, LIU Z Q, LIU C J, et al. Research on technology innovation path recognition integrating citation and text features[J]. Library and Information Service, 2023, 67(3): 49-60. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-TSQB202303005.htm
[24]	郭强, 刘冬梅. 基于LDA模型的农业农村科技创新政策扩散特征研究[J]. 中国软科学, 2023(1): 32-39. https://www.cnki.com.cn/Article/CJFDTOTAL-ZGRK202301004.htm GUO Q, LIU D M. Research on the policy diffusion characteristics of agricultural and rural science & technology innovation policies based on LDA model[J]. China Soft Science, 2023(1): 32-39. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-ZGRK202301004.htm