就是这味道 2015-07-15 12:10 采纳率: 0%
浏览 3618

python 向量空间模型 相似度计算 求大神 运行总是通不过

  #用向量空间模型计算两个字符串s和s1之间的相似度

from math import sqrt
from collections import Counter
import re

def vsm_distance(s,s1):

      #将s,s1转化为字典格式(dictionary{词:词频})
mylist=re.findall(r"\w+",s)
ss=Counter( mylist)
mylist1=re.findall(r"\w+",s1)
ss1=Counter( mylist1)
    #向量空间计算
c = set(ss.keys())&set(ss1.keys())
if not c:
    return 0
x = sum([ss.get(i)*ss1.get(i) for i in c])
sq1 = sqrt(sum([pow(ss.get(i),2) for i in ss.values()]))
sq2 = sqrt(sum([pow(ss1.get(i),2) for i in ss1.values()]))
p = float(x)/(sq1*sq2)
return p

s="KBA is to give a chance to non-popular entities information to be updated as soon as a useful information is published on the internet. The KBA organizershave built up a stream-corpus which is a huge corpus of timestamped web documents that can be processed chronologically. Hence it is possible to simulate a real time system. The documents come from newswires, blogs, forums, review, memetracker….. In addition, a set of target entities, coming from wikipedia or from twitter, has been selected for their ambiguity or unpopularity. And last but not least, more than 60000 documents have been annotated so that systems can train on it. The train period starts on documents published from october 2011 until februray, and the test period starts from februray 2012 to februray 2013."

s1="The KBA track is divided in two tasks:CCR(Cumulative Citation Recommendation) and SSF(Streaming Slot Filling). CCR task is to filter out documents worth citing in a profile of an entity(e.g., wikipedia or freebase article). SSF task is to detect changes on given slots for each of the target entities. This article is focused only on CCR task."

vsm_distance(s,s1)

  • 写回答

3条回答

  • oyljerry 2015-07-15 12:54
    关注

    运行通不过是有什么语法错误还是结果不正确?

    评论

报告相同问题?

悬赏问题

  • ¥15 C#算法问题, 不知道怎么处理这个数据的转换
  • ¥15 YoloV5 第三方库的版本对照问题
  • ¥15 请完成下列相关问题!
  • ¥15 drone 推送镜像时候 purge: true 推送完毕后没有删除对应的镜像,手动拷贝到服务器执行结果正确在样才能让指令自动执行成功删除对应镜像,如何解决?
  • ¥15 求daily translation(DT)偏差订正方法的代码
  • ¥15 js调用html页面需要隐藏某个按钮
  • ¥15 ads仿真结果在圆图上是怎么读数的
  • ¥20 Cotex M3的调试和程序执行方式是什么样的?
  • ¥20 java项目连接sqlserver时报ssl相关错误
  • ¥15 一道python难题3