duanjun7801 2013-11-06 07:49
浏览 59
已采纳

Golang:过程花了太长时间。 实施拼写检查器

http://play.golang.org/p/H5E0ExL85d

I've implemented some Peter Norvig's spelling check algorithm with Go.

It's weird that the FIRST THREE calling works correct giving me the desired output.

But from the second, it is saying "process took too long."

Could anybody look at my code and tell what goes wrong?

Here's the snippet that is possibly going wrong.

Everything works perfect with the same code in English version.

UNICODE format and boundary have changed according to language because English contain 1 byte per alphabet and Asian languages in this case contain 3 bytes per one character.

This is trying to run the same Algorithm as the one for English that was working perfect. But this is NOT working.

total_set := []string{}
for _, elem := range splits {

    if len(elem.str2) > 3 {
        //deletion
        total_set = append(total_set, elem.str1+elem.str2[3:])

        //replace
        for i:=0; i<len(koreanletter)/3; i++ {
            total_set = append(total_set, elem.str1+string(koreanletter[3*i:3*(i+1)])+elem.str2[3:])
        }

        //transpose
        if len(elem.str2) > 9 {
            total_set = append(total_set, elem.str1+string(elem.str2[3:6])+string(elem.str2[:3])+elem.str2[9:])
        }

    } else {
        //deletion
        total_set = append(total_set, elem.str1)
    }

    //insertion
    for _, c := range koreanletter {
        total_set = append(total_set, elem.str1+string(c)+elem.str2)
    }
    return RemoveDuplicateStringArrayForKorean(total_set)
}

The one for English is below. This is working perfect.

//Edits1 is to measure the distance between strings.
func (model *Model) Edits1(word string) []string {
  const alphabet = "abcdefghijklmnopqrstuvwxyz"

  splits := []Pair{}
  for i := 0; i <= len(word); i++ {
    splits = append(splits, Pair{word[:i], word[i:]})
  }

  total_set := []string{}
  for _, elem := range splits {

    if len(elem.str2) > 0 {
      //deletion
      total_set = append(total_set, elem.str1+elem.str2[1:])

      //replace
      for _, c := range alphabet {
        total_set = append(total_set, elem.str1+string(c)+elem.str2[1:])
      }

      //transpose
      if len(elem.str2) > 1 {
        total_set = append(total_set, elem.str1+string(elem.str2[1])+string(elem.str2[0])+elem.str2[2:])
      }

    } else {
      //deletion
      total_set = append(total_set, elem.str1)
    }

    //insertion
    for _, c := range alphabet {
      total_set = append(total_set, elem.str1+string(c)+elem.str2)
    }
  }
  return RemoveDuplicateStringArrayLowerCase(total_set)
}

Addition: ordered arguments and now I have three things working.

None of the characters are missing from the koreanletter.

Is there anyway that I can see the error more specifically? I just can't figure out.

  • 写回答

1条回答 默认 最新

  • doudandui1592 2013-11-06 11:46
    关注

    Playing around with your code, it seems it is your KoreanKnownEdits2 which is taking too long. In your forth example (the one failing), the length of model.KoreanEdits1(input_word) is 28197 and the length of the first model.KoreanEdits1(elem1) is 23499, which makes around 662 millions cases to try. It seems the program is failing after the first 147 thousands, because it takes too long (playground).

    Any examples which did not required a call to KoreanKnownEdits2 seem to work, so I suspect you should rewrite this function to avoid the exhaustive search, or at least limit it to a more reasonable size if you want to use it with the playground's time limit. I haven't studied your code in enough details to be 100% certain of that, but I suspect the 26 letters of western alphabet make it manageable for the English version, while the extended Korean alphabet makes the size of your input too large to be processed on the playground's time limit, regardless of the number of bytes each character is encoded with.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 ogg dd trandata 报错
  • ¥15 高缺失率数据如何选择填充方式
  • ¥50 potsgresql15备份问题
  • ¥15 Mac系统vs code使用phpstudy如何配置debug来调试php
  • ¥15 目前主流的音乐软件,像网易云音乐,QQ音乐他们的前端和后台部分是用的什么技术实现的?求解!
  • ¥60 pb数据库修改与连接
  • ¥15 spss统计中二分类变量和有序变量的相关性分析可以用kendall相关分析吗?
  • ¥15 拟通过pc下指令到安卓系统,如果追求响应速度,尽可能无延迟,是不是用安卓模拟器会优于实体的安卓手机?如果是,可以快多少毫秒?
  • ¥20 神经网络Sequential name=sequential, built=False
  • ¥16 Qphython 用xlrd读取excel报错