good-turing平滑方法的缺点改进

1个回答

``````这个问题是我做毕设初期的疑惑，但是后面自己就可以解答了。

那比较实用的数据平滑方法有如下：留存插值法，Jelinek-Mercer 方法，Witten-Bell方法，Kneser-Ney方法，Absolute discounting方法，Modified Kneser-Ney方法以及后备模型下的Katz平滑方法。大多数都是不仅仅用了一种平滑思想。对于这些方法，使用效果说法不一。但是目前我所看到的最多的是 说Kneser-Ney方法最佳，
我在毕设中实际用到的也是 Kneser-Ney方法，但是很不幸，所选数据还是会出现0的现象，但是已经比Good-turing好太多。于是没有别的办法，我把中间过程的数据0改为了0.1。这也是无奈之举，不过确实不糊影响到实际的概率分布状况。其实也是无奈之举，毕竟自己短期做个项目，数据不可能都训练到，只能自己根据凭据平滑的原则，“机智地”变更了一下下。
对此，推荐一本书：宗成庆的《统计自然语言》。里面会有详细系统的讲解。
``````

srilm测设问题及显示问题，谢谢大神帮忙！！！！！

Turing Tree 树的算法
Problem Description After inventing Turing Tree, 3xian always felt boring when solving problems about intervals, because Turing Tree could easily have the solution. As well, wily 3xian made lots of new problems about intervals. So, today, this sick thing happens again... Now given a sequence of N numbers A1, A2, ..., AN and a number of Queries(i, j) (1≤i≤j≤N). For each Query(i, j), you are to caculate the sum of distinct values in the subsequence Ai, Ai+1, ..., Aj. Input The first line is an integer T (1 ≤ T ≤ 10), indecating the number of testcases below. For each case, the input format will be like this: * Line 1: N (1 ≤ N ≤ 30,000). * Line 2: N integers A1, A2, ..., AN (0 ≤ Ai ≤ 1,000,000,000). * Line 3: Q (1 ≤ Q ≤ 100,000), the number of Queries. * Next Q lines: each line contains 2 integers i, j representing a Query (1 ≤ i ≤ j ≤ N). Output For each Query, print the sum of distinct values of the specified subsequence in one line. Sample Input 2 3 1 1 4 2 1 2 2 3 5 1 1 2 1 3 3 1 5 2 4 3 5 Sample Output 1 5 6 3 6

Problem Description After inventing Turing Tree, 3xian always felt boring when solving problems about intervals, because Turing Tree could easily have the solution. As well, wily 3xian made lots of new problems about intervals. So, today, this sick thing happens again... Now given a sequence of N numbers A1, A2, ..., AN and a number of Queries(i, j) (1≤i≤j≤N). For each Query(i, j), you are to caculate the sum of distinct values in the subsequence Ai, Ai+1, ..., Aj. Input The first line is an integer T (1 ≤ T ≤ 10), indecating the number of testcases below. For each case, the input format will be like this: * Line 1: N (1 ≤ N ≤ 30,000). * Line 2: N integers A1, A2, ..., AN (0 ≤ Ai ≤ 1,000,000,000). * Line 3: Q (1 ≤ Q ≤ 100,000), the number of Queries. * Next Q lines: each line contains 2 integers i, j representing a Query (1 ≤ i ≤ j ≤ N). Output For each Query, print the sum of distinct values of the specified subsequence in one line. Sample Input 2 3 1 1 4 2 1 2 2 3 5 1 1 2 1 3 3 1 5 2 4 3 5 Sample Output 1 5 6 3 6
Turing Tree
Problem Description After inventing Turing Tree, 3xian always felt boring when solving problems about intervals, because Turing Tree could easily have the solution. As well, wily 3xian made lots of new problems about intervals. So, today, this sick thing happens again... Now given a sequence of N numbers A1, A2, ..., AN and a number of Queries(i, j) (1≤i≤j≤N). For each Query(i, j), you are to caculate the sum of distinct values in the subsequence Ai, Ai+1, ..., Aj. Input The first line is an integer T (1 ≤ T ≤ 10), indecating the number of testcases below. For each case, the input format will be like this: * Line 1: N (1 ≤ N ≤ 30,000). * Line 2: N integers A1, A2, ..., AN (0 ≤ Ai ≤ 1,000,000,000). * Line 3: Q (1 ≤ Q ≤ 100,000), the number of Queries. * Next Q lines: each line contains 2 integers i, j representing a Query (1 ≤ i ≤ j ≤ N). Output For each Query, print the sum of distinct values of the specified subsequence in one line. Sample Input 2 3 1 1 4 2 1 2 2 3 5 1 1 2 1 3 3 1 5 2 4 3 5 Sample Output 1 5 6 3 6
YAPTCHA
Problem Description The math department has been having problems lately. Due to immense amount of unsolicited automated programs which were crawling across their pages, they decided to put Yet-Another-Public-Turing-Test-to-Tell-Computers-and-Humans-Apart on their webpages. In short, to get access to their scientific papers, one have to prove yourself eligible and worthy, i.e. solve a mathematic riddle. However, the test turned out difficult for some math PhD students and even for some professors. Therefore, the math department wants to write a helper program which solves this task (it is not irrational, as they are going to make money on selling the program). The task that is presented to anyone visiting the start page of the math department is as follows: given a natural n, compute ![](http://acm.hdu.edu.cn/data/images/con181-1009-1.JPG) where [x] denotes the largest integer not greater than x. Input The first line contains the number of queries t (t <= 10^6). Each query consist of one natural number n (1 <= n <= 10^6). Output For each n given in the input output the value of Sn. Sample Input 13 1 2 3 4 5 6 7 8 9 10 100 1000 10000 Sample Output 0 1 1 2 2 2 2 3 3 4 28 207 1609

Problem Description After inventing Turing Tree, 3xian always felt boring when solving problems about intervals, because Turing Tree could easily have the solution. As well, wily 3xian made lots of new problems about intervals. So, today, this sick thing happens again... Now given a sequence of N numbers A1, A2, ..., AN and a number of Queries(i, j) (1≤i≤j≤N). For each Query(i, j), you are to caculate the sum of distinct values in the subsequence Ai, Ai+1, ..., Aj. Input The first line is an integer T (1 ≤ T ≤ 10), indecating the number of testcases below. For each case, the input format will be like this: * Line 1: N (1 ≤ N ≤ 30,000). * Line 2: N integers A1, A2, ..., AN (0 ≤ Ai ≤ 1,000,000,000). * Line 3: Q (1 ≤ Q ≤ 100,000), the number of Queries. * Next Q lines: each line contains 2 integers i, j representing a Query (1 ≤ i ≤ j ≤ N). Output For each Query, print the sum of distinct values of the specified subsequence in one line. Sample Input 2 3 1 1 4 2 1 2 2 3 5 1 1 2 1 3 3 1 5 2 4 3 5 Sample Output 1 5 6 3 6
Hard to Believe, but True!
Description The fight goes on, whether to store numbers starting with their most significant digit or their least significant digit. Sometimes this is also called the "Endian War". The battleground dates far back into the early days of computer science. Joe Stoy, in his (by the way excellent) book "Denotational Semantics", tells following story: "The decision which way round the digits run is, of course, mathematically trivial. Indeed, one early British computer had numbers running from right to left (because the spot on an oscilloscope tube runs from left to right, but in serial logic the least significant digits are dealt with first). Turing used to mystify audiences at public lectures when, quite by accident, he would slip into this mode even for decimal arithmetic, and write things like 73+42=16. The next version of the machine was made more conventional simply by crossing the x-deflection wires: this, however, worried the engineers, whose waveforms were all backwards. That problem was in turn solved by providing a little window so that the engineers (who tended to be behind the computer anyway) could view the oscilloscope screen from the back. [C. Strachey - private communication.]" You will play the role of the audience and judge on the truth value of Turing's equations. Input The input contains several test cases. Each specifies on a single line a Turing equation. A Turing equation has the form "a+b=c", where a, b, c are numbers made up of the digits 0,...,9. Each number will consist of at most 7 digits. This includes possible leading or trailing zeros. The equation "0+0=0" will finish the input and has to be processed, too. The equations will not contain any spaces. Output For each test case generate a line containing the word "True" or the word "False", if the equation is true or false, respectively, in Turing's interpretation, i.e. the numbers being read backwards. Sample Input 73+42=16 5+8=13 10+20=30 0001000+000200=00030 1234+5=1239 1+0=0 7000+8000=51 0+0=0 Sample Output True False True True False False True True
Final Standings
WishingBone is quite familiar with rules of contest ranking. It states: Teams are ranked by the number of problems solved in descending order; Teams that solved the same number of problems are ranked by total penalty time in ascending order; Teams that have the same rank are listed by team id in lexicographic order, which should be case-insensitive. Additionally, A problem is solved only when the team has submitted an accepted run. Penalty time for a problem is the sum of the elapsed time of the first accepted run and twenty minutes for each previous run of this problem. Total penalty time is the sum of penalty time for all the problems solved. Only teams solved at least one problem will be listed on the final standing. You are requested to generate this final standing. Input The first line of input is a positive integer N (N <= 100), which is the number of problems of this contest. Each line of input represents one run in the form Elapsed Time Team ID Problem ID Judge Reply where Elapsed Time is the time from the start of the contest (in minute); Team ID is a string of no more than 30 upper and lower Latin characters; Problem ID is an integer from 1 to N; Judge Reply is one of AC, PE, CE, RE, TLE, MLE and OLE. Elasped Time will be in ascending order. The number of teams will not exceed 10000. Output Print one team on each line in the form Rank Team ID Problems Solved Total Penalty Time the first three of which should be left-justified in fields of 10, 30 and 10. Refer to sample output for more details. Sample Input 3 30 Fatmouse 1 WA 32 Killer 2 AC 39 Turing 3 RE 56 Fatmouse 2 CE 63 Turing 3 AC 77 Killer 1 PE 79 Killer 1 AC 83 ZzZzZ 3 AC 89 Fatmouse 3 OLE 89 Chenyue 3 AC Sample Output 1 Killer 2 131 2 Turing 1 83 ZzZzZ 1 83 4 Chenyue 1 89
webwork+jpa+spring 启动时候注入userservice出错...
Language of FatMouse
We all know that FatMouse doesn't speak English. But now he has to be prepared since our nation will join WTO soon. Thanks to Turing we have computers to help him. Input Specification Input consists of up to 100,005 dictionary entries, followed by a blank line, followed by a message of up to 100,005 words. Each dictionary entry is a line containing an English word, followed by a space and a FatMouse word. No FatMouse word appears more than once in the dictionary. The message is a sequence of words in the language of FatMouse, one word on each line. Each word in the input is a sequence of at most 10 lowercase letters. Output Specification Output is the message translated to English, one word per line. FatMouse words not in the dictionary should be translated as "eh". Sample Input dog ogday cat atcay pig igpay froot ootfray loops oopslay atcay ittenkay oopslay Output for Sample Input cat eh loops
A CAPTCHA (Completely Automated Public Turing Test to Tell Computers and Humans Apart) is a program that generates and grades tests that are human solvable, but intends to be beyond the capabilities of current computer programs. This technology is now almost a standard security mechanism for defending against undesirable or malicious Internet bot programs, such as those spreading junk emails and those grabbing thousands of free email accounts instantly. It is hard for you to make a program to solve it. So, in this moment, you just need to solve a much easier problem. 111111MMM1111111 1MMMMMMMMMMM1111 11111MMMMMMMM111 1MMMMMMMMMMM1111 1MMMMMMMMMMMM111 1MMMMMMMMMMMMM11 11111MM1MM111111 1MM11111111MM111 111MM1111111MM11 1MM111111111MM11 1MM1111111111111 1MM1111111111111 1111MM111MM11111 1MM11111111MM111 11MM111111111MM1 1MM1111111111MM1 1MM1111111111111 1MM1111111111111 111MMMMMMMMM1111 1MMMMMMMMMMM1111 11MM111111111111 1MM1111111111MM1 1MMMMMMMMMMMM111 1MMMMMMMMMMMMM11 11MM1111111MM111 1MM11111111MM111 11MM111111111MM1 1MM1111111111MM1 1MM1111111111111 1MM1111111111111 1MMM11111111MM11 1MM11111111MM111 111MM1111111MM11 1MM111111111MM11 1MM1111111111111 1MM1111111111111 1MM1111111111MM1 1MMMMMMMMMMM1111 11111MMMMMMMM111 1MMMMMMMMMMM1111 1MMMMMMMMMMMM111 1MM1111111111111 11111MMMMMMMM111 1MM111111111MM11 11111MMMMMM11111 1111MMMMMMMM1111 11MM111111MMM111 11MM111111111111 111MM1111111MM11 1MM111111111MM11 1111111MM1111111 1111111MM1111111 11MM11111MMM1111 11MM111111111111 11MM111111111MM1 1MM111111111MM11 1111111MM1111111 1111111MM1111111 11MM111MMM111111 11MM111111111111 11MM111111111111 1MMMMMMMMMMMMM11 1111111MM1111111 1111111MM1111111 11MMMMM111111111 11MM111111111111 11MM111111MMMMM1 1MM111111111MM11 1111111MM1111111 111MM11MM1111111 11MM111MMM111111 11MM111111111111 111MM1111111MM11 1MM111111111MM11 1111111MM1111111 111MMM1MM1111111 11MM11111MMM1111 11MM111111111111 11111MMMMMMMMM11 1MM111111111MM11 11111MMMMMM11111 11111MMMM1111111 11MM111111MMMM11 11MMMMMMMMMMMM11 1MM1111111111MM1 1MMM111111111MM1 11111MMMMMM11111 1MMMMMMMMMMM1111 11111MMMMMM11111 1MMMMMMMMMMM1111 1MMMM111111MMMM1 1MMMM11111111MM1 111MMM1111MMM111 1MM111111111MM11 111MMM1111MMM111 1MM111111111MM11 1MM1MM1111MM1MM1 1MM1MM1111111MM1 11MMM111111MMM11 1MM1111111111MM1 11MMM111111MMM11 1MM1111111111MM1 1MM11MMMMM111MM1 1MM11MM111111MM1 1MM1111111111MM1 1MM111111111MM11 1MM1111111111MM1 1MM111111111MM11 1MM1111M11111MM1 1MM1111MM1111MM1 11MMM111111MMM11 1MMMMMMMMMMM1111 11MMM1MMMM1MMM11 1MMMMMMMMMMM1111 1MM1111111111MM1 1MM111111MMM1MM1 111MMM1111MMM111 1MM1111111111111 111MMM11MMMMM111 1MM11111111MM111 1MM1111111111MM1 1MM11111111MMMM1 11111MMMMMM11111 1MM1111111111111 111111MMMM1MMMM1 1MM111111111MMM1 1111MMMMMMMM1111 11MMMMMMMMMMMM11 1MM1111111111MM1 1MMMM111111MMMM1 1MM1111111111MM1 11MMM111111MMM11 111MM1111111MM11 11MMMMMMMMMMMM11 1MM1111111111MM1 11MMM111111MMM11 1MM1111111111MM1 111MMM1111MMM111 11MMM1111111MMM1 1111111MM1111111 1MM1111111111MM1 11MMM111111MMM11 11MM111MM111MM11 1111MMM11MMM1111 1111MMMMM1111111 1111111MM1111111 1MM1111111111MM1 111MMM1111MMM111 11MM111MM111MM11 111111MMMM111111 1MMM111MMMM11111 1111111MM1111111 1MMM11111111MMM1 1111MMM11MMM1111 11MM111MM111MM11 1111MMM11MMM1111 111MMM11111MMM11 1111111MM1111111 1MMM11111111MMM1 11111MM11MM11111 11MM1MM11MM1MM11 111MMM1111MMM111 11111MMMMMMM1111 1111111MM1111111 111MMMMMMMMMM111 111111MMMM111111 111MMM1111MMM111 11MMM111111MMM11 11MMM111111MMM11 111MMMMMMMMMM111 111MMM1111MMM111 1111111111MM1111 1111MMM11MMM1111 111111111MM11111 111111MMMM111111 11111111MM111111 1111111MM1111111 111111MM11111111 1111111MM1111111 11111MM111111111 1111111MM1111111 111MMMMMMMMMMM11 Assume that the CAPTCHA only consists of upper letters. As you see above, we use a 7 * 16 matrix to represent a letter. The matrix only consists of characters 'M' and '1'. You can make sure that the 'M' elements in each letter matrix are connected. It means the all 'M' elements in each letter matrix belong to only ONE component. One element is connected to its 8 neighboring elements. That is, (x - 1, y), (x + 1, y), (x, y - 1), (x, y + 1), (x - 1, y - 1), (x + 1, y - 1), (x - 1, y + 1), (x + 1, y + 1) are (x, y)'s neighbors and they are connected. And you will be given a bigger matrix, like this: 11111MMMMMM111 1111111MM11111 1MM1111MM11111 1MM1111MM11111 1MM1111MM11111 1MM1111MM11111 1MM11MMMMMM111 1MM11111111111 1MMMMMMMMMMMM1 Your task is to tell which letters appear in this matrix. The answer for this sample is 'I' and 'L'. What's more, the letter appears in the bigger matrix may rotate 180 degrees, like this: 1MM1111111111MM11111111111111MM1 1MM1111111111MM11111111111111MM1 1MM1111111111MM11111111111111MM1 1MM1111111111MM111MMMMMMMMMMMMM1 1MMM11111111MMM11111111111111MM1 1MMM11111111MMM11111111111111MM1 111MMMMMMMMMM11111MMMMMMMMMMMMM1 'F' and 'U' is the answer for this sample. OK, it's time for you to finish this work. Input There are multiple cases (no more than 10). In the first line, two integers n and m (7 <= n, m <= 300) will be given. Following n lines give the matrix. Each line contains m characters. Each character will be either 'M' or '1'. You may assume that 'M' characters between any pair of letters in the matrix won't be connected, and each 'M' in the matrix belongs to one valid letter. There is a blank line between cases. Output Output the characters appear in the matrix. If a letter appears more than once, just output it ONE time. Sort the answer in alphabet order. Sample Input 7 32 1MM1111111111MM11111111111111MM1 1MM1111111111MM11111111111111MM1 1MM1111111111MM11111111111111MM1 1MM1111111111MM111MMMMMMMMMMMMM1 1MMM11111111MMM11111111111111MM1 1MMM11111111MMM11111111111111MM1 111MMMMMMMMMM11111MMMMMMMMMMMMM1 15 19 11111MMMMMMMM111111 111MM1111111MM11111 11MM111111111MM1111 11MM111111111111111 11MM111111MMMMM1111 111MM1111111MM11111 11111MMMMMMMMM11111 1111111111111111111 1111MMM111111111MM1 1111MMMM11111111MM1 1111MM1MM1111111MM1 1111MM11MM111111MM1 1111MM1111MM1111MM1 1111MM111111MMM1MM1 1111MM11111111MMMM1 Sample Output FU GN

《奇巧淫技》系列-python！！每天早上八点自动发送天气预报邮件到QQ邮箱

8年经验面试官详解 Java 面试秘诀
作者 | 胡书敏 责编 | 刘静 出品 | CSDN（ID：CSDNnews） 本人目前在一家知名外企担任架构师，而且最近八年来，在多家外企和互联网公司担任Java技术面试官，前后累计面试了有两三百位候选人。在本文里，就将结合本人的面试经验，针对Java初学者、Java初级开发和Java开发，给出若干准备简历和准备面试的建议。   Java程序员准备和投递简历的实

MyBatis研习录(01)——MyBatis概述与入门
C语言自学完备手册(33篇) Android多分辨率适配框架 JavaWeb核心技术系列教程 HTML5前端开发实战系列教程 MySQL数据库实操教程(35篇图文版) 推翻自己和过往——自定义View系列教程(10篇) 走出思维困境，踏上精进之路——Android开发进阶精华录 讲给Android程序员看的前端系列教程(40集免费视频教程+源码) 版权声明 本文原创作者：谷哥的小弟 作者博客

Python爬虫爬取淘宝，京东商品信息

Java工作4年来应聘要16K最后没要,细节如下。。。

Python爬虫精简步骤1 获取数据

Python绘图，圣诞树，花，爱心 | Turtle篇
1.画圣诞树 import turtle screen = turtle.Screen() screen.setup(800,600) circle = turtle.Turtle() circle.shape('circle') circle.color('red') circle.speed('fastest') circle.up() square = turtle.Turtle()

CPU对每个程序员来说，是个既熟悉又陌生的东西？ 如果你只知道CPU是中央处理器的话，那可能对你并没有什么用，那么作为程序员的我们，必须要搞懂的就是CPU这家伙是如何运行的，尤其要搞懂它里面的寄存器是怎么一回事，因为这将让你从底层明白程序的运行机制。 随我一起，来好好认识下CPU这货吧 把CPU掰开来看 对于CPU来说，我们首先就要搞明白它是怎么回事，也就是它的内部构造，当然，CPU那么牛的一个东

2020年1月17日，国家统计局发布了2019年国民经济报告，报告中指出我国人口突破14亿。 猪哥的朋友圈被14亿人口刷屏，但是很多人并没有看到我国复杂的人口问题：老龄化、男女比例失衡、生育率下降、人口红利下降等。 今天我们就来分析一下我们国家的人口数据吧！ 更多有趣分析教程，扫描下方二维码关注vx公号「裸睡的猪」 即可查看！ 一、背景 1.人口突破14亿 2020年1月17日，国家统计局发布
web前端javascript+jquery知识点总结
Javascript javascript 在前端网页中占有非常重要的地位，可以用于验证表单，制作特效等功能，它是一种描述语言，也是一种基于对象（Object）和事件驱动并具有安全性的脚本语言 ，语法同java类似，是一种解释性语言，边执行边解释。 JavaScript的组成： ECMAScipt 用于描述: 语法，变量和数据类型，运算符，逻辑控制语句，关键字保留字，对象。 浏览器对象模型（Br
Python实战：抓肺炎疫情实时数据，画2019-nCoV疫情地图

Python：爬取疫情每日数据

B 站上有哪些很好的学习资源?

Web播放器解决了在手机浏览器和PC浏览器上播放音视频数据的问题，让视音频内容可以不依赖用户安装App，就能进行播放以及在社交平台进行传播。在视频业务大数据平台中，播放数据的统计分析非常重要，所以Web播放器在使用过程中，需要对其内部的数据进行收集并上报至服务端，此时，就需要对发生在其内部的一些播放行为进行事件监听。 那么Web播放器事件监听是怎么实现的呢？ 01 监听事件明细表 名
3万字总结，Mysql优化之精髓

Python新型冠状病毒疫情数据自动爬取+统计+发送报告+数据屏幕（三）发送篇

1. 传统事件绑定和符合W3C标准的事件绑定有什么区别？ 传统事件绑定 &lt;div onclick=""&gt;123&lt;/div&gt; div1.onclick = function(){}; &lt;button onmouseover=""&gt;&lt;/button&gt; 注意： 如果给同一个元素绑定了两次或多次相同类型的事件，那么后面的绑定会覆盖前面的绑定 （不支持DOM事...

Python学习笔记（语法篇）

Python绘图与可视化

MySQL表的增删查改(提高篇)
MySQL表的增删查改(基本篇) 接上一篇MySQL表基本的增删查改，下面看一下提高篇： 一、数据库约束 1、约束类型 NOT NULL：不为空约束。创建表时，可以指定某列不为空 UNIQUE ：唯一约束。指定某列为唯一的、不重复的 DEFAULT ：默认值约束。指定插入数据时，某列为空，设置默认值 PRIMARY KEY ： 主键约束。NOT NULL 和 UNIQUE 的结合。确保某列（或两个...