duanjun7801 2013-11-06 07:49
浏览 59
已采纳

Golang:过程花了太长时间。 实施拼写检查器

http://play.golang.org/p/H5E0ExL85d

I've implemented some Peter Norvig's spelling check algorithm with Go.

It's weird that the FIRST THREE calling works correct giving me the desired output.

But from the second, it is saying "process took too long."

Could anybody look at my code and tell what goes wrong?

Here's the snippet that is possibly going wrong.

Everything works perfect with the same code in English version.

UNICODE format and boundary have changed according to language because English contain 1 byte per alphabet and Asian languages in this case contain 3 bytes per one character.

This is trying to run the same Algorithm as the one for English that was working perfect. But this is NOT working.

total_set := []string{}
for _, elem := range splits {

    if len(elem.str2) > 3 {
        //deletion
        total_set = append(total_set, elem.str1+elem.str2[3:])

        //replace
        for i:=0; i<len(koreanletter)/3; i++ {
            total_set = append(total_set, elem.str1+string(koreanletter[3*i:3*(i+1)])+elem.str2[3:])
        }

        //transpose
        if len(elem.str2) > 9 {
            total_set = append(total_set, elem.str1+string(elem.str2[3:6])+string(elem.str2[:3])+elem.str2[9:])
        }

    } else {
        //deletion
        total_set = append(total_set, elem.str1)
    }

    //insertion
    for _, c := range koreanletter {
        total_set = append(total_set, elem.str1+string(c)+elem.str2)
    }
    return RemoveDuplicateStringArrayForKorean(total_set)
}

The one for English is below. This is working perfect.

//Edits1 is to measure the distance between strings.
func (model *Model) Edits1(word string) []string {
  const alphabet = "abcdefghijklmnopqrstuvwxyz"

  splits := []Pair{}
  for i := 0; i <= len(word); i++ {
    splits = append(splits, Pair{word[:i], word[i:]})
  }

  total_set := []string{}
  for _, elem := range splits {

    if len(elem.str2) > 0 {
      //deletion
      total_set = append(total_set, elem.str1+elem.str2[1:])

      //replace
      for _, c := range alphabet {
        total_set = append(total_set, elem.str1+string(c)+elem.str2[1:])
      }

      //transpose
      if len(elem.str2) > 1 {
        total_set = append(total_set, elem.str1+string(elem.str2[1])+string(elem.str2[0])+elem.str2[2:])
      }

    } else {
      //deletion
      total_set = append(total_set, elem.str1)
    }

    //insertion
    for _, c := range alphabet {
      total_set = append(total_set, elem.str1+string(c)+elem.str2)
    }
  }
  return RemoveDuplicateStringArrayLowerCase(total_set)
}

Addition: ordered arguments and now I have three things working.

None of the characters are missing from the koreanletter.

Is there anyway that I can see the error more specifically? I just can't figure out.

  • 写回答

1条回答 默认 最新

  • doudandui1592 2013-11-06 11:46
    关注

    Playing around with your code, it seems it is your KoreanKnownEdits2 which is taking too long. In your forth example (the one failing), the length of model.KoreanEdits1(input_word) is 28197 and the length of the first model.KoreanEdits1(elem1) is 23499, which makes around 662 millions cases to try. It seems the program is failing after the first 147 thousands, because it takes too long (playground).

    Any examples which did not required a call to KoreanKnownEdits2 seem to work, so I suspect you should rewrite this function to avoid the exhaustive search, or at least limit it to a more reasonable size if you want to use it with the playground's time limit. I haven't studied your code in enough details to be 100% certain of that, but I suspect the 26 letters of western alphabet make it manageable for the English version, while the extended Korean alphabet makes the size of your input too large to be processed on the playground's time limit, regardless of the number of bytes each character is encoded with.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥20 我想使用一些网络协议或者部分协议也行,主要想实现类似于traceroute的一定步长内的路由拓扑功能
  • ¥30 深度学习,前后端连接
  • ¥15 孟德尔随机化结果不一致
  • ¥15 apm2.8飞控罗盘bad health,加速度计校准失败
  • ¥15 求解O-S方程的特征值问题给出边界层布拉休斯平行流的中性曲线
  • ¥15 谁有desed数据集呀
  • ¥20 手写数字识别运行c仿真时,程序报错错误代码sim211-100
  • ¥15 关于#hadoop#的问题
  • ¥15 (标签-Python|关键词-socket)
  • ¥15 keil里为什么main.c定义的函数在it.c调用不了