You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

22 lines
1.3 KiB
Python

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

# encoding:utf-8
import kenlm
import jieba
import time
# model = kenlm.Model('build/my_model/douban.arpa')
model = kenlm.Model('../kenlm/my_model/douban.bin')
test_list = ["您好!常见的玻璃水较好的品牌有威猛先生玻璃水、龟牌玻璃水、车仆玻璃水、博世玻璃水、长城玻璃水,如何选择,可以建议通过相关论坛进行了解。",
"您好!此类故障可能是油门踏板失灵,节气门故障或油泵压力不足引起的。",
"您好!定金是支付给商家作为车款确认的保证,如未按约定时间交车,应支付相应的违约金。",
"您好!异响情况复杂,很难维权;除非您对车辆进行有针对性的鉴定,查明具体原因,再要求解决。可以通过投诉的途径与厂家进行协商。",
"您好打扫i到静安寺哦记得十三点较快拉升阶段了杀敌哦阿姐斯大林看见撒娇的了静安寺哦啊是",
"您好!你的车辆非常不足,品质优秀,油门踏板失灵且加速非常快"]
# 对每一句话进行分词并连接
seg_list = [' '.join(jieba.lcut(sentence)) for sentence in test_list]
for i in seg_list:
time_start = time.time()
print(i, ': ', model.score(i, bos=True, eos=True), 'time: ', time.time() - time_start)