安东三废 2024-07-18 22:08 采纳率: 60%

HMM报错ValueError: transmat_ rows

使用CategoricalHMM，在model.predict(X)[position-1]时报错ValueError: transmat_和为[1., 0., 0.]

目标是预测韩语元音类别，分词后把音节拆分为音素，只保留元音类别序列

import numpy as np
from hmmlearn import hmm
import jamotools


 # Specify the position of the vowel to predict
test_position = 2

# Define the vowel categories
vowel_categories = {'0': ['ㅏ', 'ㅑ', 'ㅗ', 'ㅛ', 'ㅐ', 'ㅘ', 'ㅚ', 'ㅙ'],
                    '1': ['ㅓ', 'ㅕ', 'ㅜ', 'ㅠ', 'ㅔ', 'ㅝ', 'ㅟ', 'ㅞ'],
                    '2': ['ㅡ', 'ㅣ', 'ㅢ']}

states = ['0', '1', '2']
observations = np.array(['ㅏ', 'ㅑ', 'ㅗ', 'ㅛ', 'ㅐ', 'ㅘ', 'ㅚ', 'ㅙ', 'ㅓ', 'ㅕ', 'ㅜ', 'ㅠ', 'ㅔ', 'ㅝ', 'ㅟ', 'ㅞ', 'ㅡ', 'ㅣ', 'ㅢ'])
emission_probability3 = np.array([[0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.01, 0, 0, 0, 0, 0, 0, 0, 0, 0.09, 0.09, 0.09],
                        [0, 0, 0, 0, 0, 0, 0, 0, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09],
                        [0.0526, 0.0526, 0.0526, 0.0526, 0.0526, 0.0526, 0.0526, 0.0526, 0.0526, 0.0526, 0.0526, 0.0526, 0.0526, 0.0526, 0.0526, 0.0526, 0.0526, 0.0532, 0.0526]])

emission_probability2 = np.array([[0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                        [0, 0, 0, 0, 0, 0, 0, 0, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0, 0, 0],
                        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.33, 0.34, 0.33]])


# Function to split a Korean character into its components
def split_korean_character(character):
    components = []
    x = jamotools.split_syllables(character)
    for i in x:
        i = str(i)
        components.append(i)
    return components

# Assign vowel categories to each vowel in the text
def mark_words(text):
    marked_text = []
    text_split = text.split()
    for word in text_split:
        word = word.strip("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-'!?.,—%1234567890(《》<>)")
        #print("learning word:",word)
        marked_word = ""
        for char in word:
            components = split_korean_character(char)
            word_categories = ""
            for component in components:
                if component in ['ㅏ', 'ㅑ', 'ㅗ', 'ㅛ', 'ㅐ', 'ㅘ', 'ㅚ', 'ㅙ']:
                    category = '0'
                elif component in ['ㅓ', 'ㅕ', 'ㅜ', 'ㅠ', 'ㅔ', 'ㅝ', 'ㅟ', 'ㅞ']:
                    category = '1'
                elif component in ['ㅡ', 'ㅣ', 'ㅢ']:
                    category = '2'
                else:
                    category = ''
                word_categories += category
            marked_word += word_categories
        if len(marked_word) > test_position + 1:
            marked_text.append(marked_word)
        #print("marked word:",marked_word)
    #print("length of training words:",len(marked_text))
    return marked_text

# Learn from the annotated data
def learn_markov_chain(data):
    model = hmm.CategoricalHMM(n_components=len(states), n_features= 19, init_params='st')
    model.emissionprob_ = np.array(emission_probability2, dtype=float)
    observed = []
    lengths = []
    for sentence in data:
        lengths.append(len(sentence))
        observed.extend([vowel[0] for vowel in sentence])
        #print("sentence:",sentence)
    X = np.array([[int(obs)] for obs in observed])
    lengths = np.array(lengths)
    lengths = lengths.astype(int)
    model.fit(X, lengths)

    return model

# Test the model by hiding the specified syllable at the test position in each word
def predict_vowel_category(model, text, position):
    print("prediction begins")
    sentence = mark_words(text)
    corretto = 0
    fiadata = 0
    corretto_fiadata = 0
    for word_idx, word in enumerate(sentence):
        #print("\nsentence:", word_idx, '\n', 'word:', word, 'len_sen:',len(sentence))
        if position <= len(word):
            hidden_word = word[:position-1]  + word[position:]
            #print("hidden_word:",hidden_word)
            X = np.array([[int(char)] for char in hidden_word])
   
            # calculate correct ratio
            predicted_category = states[model.predict(X)[position-1]]
            real_category = word[position-1]
            print(f"Word '{text.split()[word_idx]}': Predicted category at position {position} is {predicted_category}, Real category is {real_category}")
            if predicted_category == real_category:
                corretto += 1

            # calculate correct ratio of the words vialating vowel harmony
            add = 0
            len_drop = 0
            
            for i in X:
                if i!= 2:
                    add+=i
                    len_drop+=1 
            if add != len_drop and add!= 0:
                fiadata += 1
                if predicted_category == real_category:
                    corretto_fiadata += 1
                print('add:', add, 'len_drop:', len_drop, 'Vowel Harmony Vialation')
    print('ratio:', corretto / len(sentence))
    print('fiadata:', fiadata, 'ratio of vialated:', corretto_fiadata/fiadata)

# Training text
korean2 =  open('korean2.txt','r',encoding='utf-8')
training_text = korean2.read()
korean2.close()

marked_data = mark_words(training_text)
for i in range(200):
    model = learn_markov_chain(marked_data)
    print(f"training time:{i}")

# Test text
with open('korean1.txt', 'r', encoding='utf-8') as korean1:
    test_text = korean1.read()  # Test text
    test_split = test_text.split()
    elements_to_remove = []
    for i in test_split:
        if len(i) < test_position+1:
            elements_to_remove.append(i)
    test_split = [x for x in test_split if x not in elements_to_remove]
    test_text = ' '.join(test_split)

predicted_category = predict_vowel_category(model, test_text, test_position)

"""
Some rows of transmat_ have zero sum because no transition from the state was ever observed.
Some rows of transmat_ have zero sum because no transition from the state was ever observed.
training time:0
prediction begins
X: [[0]
 [0]
 [1]]
Traceback (most recent call last):
  File "c:\Users\Desktop\项目\概率论\hmm2.py", line 140, in <module>
    predicted_category = predict_vowel_category(model, test_text, test_position)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\Desktop\项目\概率论\hmm2.py", line 97, in predict_vowel_category
    predicted_category = states[model.predict(X)[position-1]]
                                ^^^^^^^^^^^^^^^^
  File "C:\conda\envs\mne\Lib\site-packages\hmmlearn\_emissions.py", line 27, in <lambda>
    return functools.wraps(func)(lambda *args, **kwargs: func(*args, **kwargs))
                                                         ^^^^^^^^^^^^^^^^^^^^^
  File "C:\conda\envs\mne\Lib\site-packages\hmmlearn\base.py", line 375, in predict
    _, state_sequence = self.decode(X, lengths)
                        ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\conda\envs\mne\Lib\site-packages\hmmlearn\_emissions.py", line 27, in <lambda>
    return functools.wraps(func)(lambda *args, **kwargs: func(*args, **kwargs))
                                                         ^^^^^^^^^^^^^^^^^^^^^
  File "C:\conda\envs\mne\Lib\site-packages\hmmlearn\base.py", line 336, in decode
    self._check()
  File "C:\conda\envs\mne\Lib\site-packages\hmmlearn\hmm.py", line 139, in _check
    super()._check()
  File "C:\conda\envs\mne\Lib\site-packages\hmmlearn\base.py", line 977, in _check
    self._check_sum_1("transmat_")
  File "C:\conda\envs\mne\Lib\site-packages\hmmlearn\base.py", line 951, in _check_sum_1
    raise ValueError(
ValueError: transmat_ rows must sum to 1 (got row sums of [1. 0. 0.])
"""

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

3条回答默认最新

阿里嘎多学长 2024-07-18 22:08

关注

以下内容由AIGC及阿里嘎多学长共同生成、有用望采纳：

从你提供的代码和错误信息来看，主要问题在于 transmat_ 的行和不为 1。这通常是因为在初始化或训练过程中，状态转移矩阵没有正确地设置。以下是一些具体的解决方案：

1. 确保 `transmat_` 的行和为 1

在初始化 CategoricalHMM 时，需要确保状态转移矩阵的每一行的和为 1。你可以通过以下方式初始化 transmat_：

import numpy as np

# 假设有 3 个状态
n_states = 3

# 初始化状态转移矩阵，每一行的和为 1
transmat_init = np.ones((n_states, n_states)) / n_states

# 创建模型
model = hmm.CategoricalHMM(n_components=n_states, n_features=len(observations), init_params='st')
model.transmat_ = transmat_init

2. 检查 `emission_probability` 是否正确初始化

确保 emission_probability 的每一行的和也为 1。你可以通过以下方式检查：

print(np.sum(emission_probability2, axis=1))
print(np.sum(emission_probability3, axis=1))

如果输出不是 [1. 1. 1.]，则需要调整 emission_probability。

3. 确保模型训练数据的完整性

在训练模型时，确保数据包含所有状态的转移。可以通过增加更多的训练数据或调整数据预处理步骤来实现。

4. 处理模型预测时的数据格式问题

确保输入数据的格式与模型训练时的数据格式一致。在预测时，输入数据应该是一个二维数组，其中每一行代表一个观测序列。

5. 优化模型的预测逻辑

重新检查预测函数的逻辑，确保在隐藏音节时，模型能够正确处理并预测下一个音节的类别。

6. 增加模型的健壮性

考虑引入异常检测机制，识别并处理不常见的输入数据。

示例代码调整

以下是调整后的示例代码：

import numpy as np
from hmmlearn import hmm
import jamotools

# Specify the position of the vowel to predict
test_position = 2

# Define the vowel categories
vowel_categories = {'0': ['ㅏ', 'ㅑ', 'ㅗ', 'ㅛ', 'ㅐ', 'ㅘ', 'ㅚ', 'ㅙ'],
                    '1': ['ㅓ', 'ㅕ', 'ㅜ', 'ㅠ', 'ㅔ', 'ㅝ', 'ㅟ', 'ㅞ'],
                    '2': ['ㅡ', 'ㅣ', 'ㅢ']}

states = ['0', '1', '2']
observations = np.array(['ㅏ', 'ㅑ', 'ㅗ', 'ㅛ', 'ㅐ', 'ㅘ', 'ㅚ', 'ㅙ', 'ㅓ', 'ㅕ', 'ㅜ', 'ㅠ', 'ㅔ', 'ㅝ', 'ㅟ', 'ㅞ', 'ㅡ', 'ㅣ', 'ㅢ'])
emission_probability = np.array([[0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.01, 0, 0, 0, 0, 0, 0, 0, 0, 0.09, 0.09, 0.09],
                        [0, 0, 0, 0, 0, 0, 0, 0, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09],
                        [0.0526, 0.0526, 0.0526, 0.0526, 0.0526, 0.0526, 0.0526, 0.0526, 0.0526, 0.0526, 0.0526, 0.0526, 0.0526, 0.0526, 0.0526, 0.0526, 0.0526, 0.0532, 0.0526]])

# Function to split a Korean character into its components
def split_korean_character(character):
    components = []
    x = jamotools.split_syllables(character)
    for i in x:
        i = str(i)
        components.append(i)
    return components

# Assign vowel categories to each vowel in the text
def mark_words(text):
    marked_text = []
    text_split = text.split()
    for word in text_split:
        word = word.strip("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-'!?.,—%1234567890(《》<>)")
        marked_word = ""
        for char in word:
            components = split_korean_character(char)
            word_categories = ""
            for component in components:
                if component in ['ㅏ', 'ㅑ', 'ㅗ', 'ㅛ', 'ㅐ', 'ㅘ', 'ㅚ', 'ㅙ']:
                    category = '0'
                elif component in ['ㅓ', 'ㅕ', 'ㅜ', 'ㅠ', 'ㅔ', 'ㅝ', 'ㅟ', 'ㅞ']:
                    category = '1'
                elif component in ['ㅡ', 'ㅣ', 'ㅢ']:
                    category = '2'
                else:
                    category = ''
                word_categories += category
            marked_word += word_categories
        if len(marked_word) > test_position + 1:
            marked_text.append(marked_word)
    return marked_text

# Learn from the annotated data
def learn_markov_chain(data):
    model = hmm.CategoricalHMM(n_components=len(states), n_features= 19, init_params='st')
    model.emissionprob_ = emission_probability
    model.transmat_ = np.ones((len(states), len(states))) / len(states)  # 初始化状态转移矩阵
    observed = []
    lengths = []
    for sentence in data:
        lengths.append(len(sentence))
        observed.extend([vowel[0] for vowel in sentence])
    X = np.array([[int(obs)] for obs in observed])
    lengths = np.array(lengths)
    lengths = lengths.astype(int)
    model.fit(X, lengths)

    return model

# Test the model by hiding the specified syllable at the test position in each word
def predict_vowel_category(model, text, position):
    sentence = mark_words(text)
    corretto = 0
    for word_idx, word in enumerate(sentence):
        if position <= len(word):
            hidden_word = word[:position-1]  + word[position:]
            X = np.array([[int(char)] for char in hidden_word])
            predicted_category = states[model.predict(X)[position-1]]
            real_category = word[position-1]
            if predicted_category == real_category:
                corretto += 1
    print('ratio:', corretto / len(sentence))

# Training text
korean2 =  open('korean2.txt','r',encoding='utf-8')
training_text = korean2.read()
korean2.close()

marked_data = mark_words(training_text)
model = learn_markov_chain(marked_data)

# Test text
with open('korean1.txt', 'r', encoding='utf-8') as korean1:
    test_text = korean1.read()  # Test text
    test_split = test_text.split()
    elements_to_remove = []
    for i in test_split:
        if len(i) < test_position+1:
            elements_to_remove.append(i)
    test_split = [x for x in test_split if x not in elements_to_remove]
    test_text = ' '.join(test_split)

predict_vowel_category(model, test_text, test_position)

这段代码首先初始化状态转移矩阵，确保每一行的和为 1，并在训练模型时使用这些初始化值。这样可以避免在预测时出现 ValueError。

编辑记录

报告相同问题？

关注问题

ＨＭＭ词性标注提示ＨＭＭ没有定义 python 有问必答
2021-12-16 23:29

回答 1 已采纳看看是不是缩进问题。由于看不到整个代码的缩进，hmm = HMM()类实例化应该写在类定义的外部
怎么用自然语言处理来做错别字检查自然语言处理
2018-06-23 08:27

回答 2 已采纳 https://cloud.tencent.com/developer/article/1030573可以去看看
紧急：运行时错误：索引超出范围[已恢复]
2018-01-16 18:53

回答 1 已采纳 One of your test cases is: { "silence", "", "Fine. Be that way!", }, This calls Hey
python内置函数(一): enumerate
2021-10-25 12:20

harry_tea的博客 enumerate(sequence, [start=0]) sequence：可迭代序列 strat：迭代序号，可选，默认从0开始 enumerate() 函数用于将一个可遍历的数据对象(如列表、元组或字符串)组合为一个索引序列，同时列出数据和数据下标 ...
结巴分词时，AttributeError: 'DataFrame' object has no attribute 'decode'， python 有问必答
2021-09-12 19:53

回答 1 已采纳 jieba的cut方法，接受的参数为str类型，而你的参数ecom_info是一个dataframe。你应该先将ecom_info转换为str类型才可以，或者用apply对dataframe具体的值应
preg_match：应该什么都不匹配？ php
2011-08-19 23:40

回答 2 已采纳 There are actually several nothings in the string "test". They are (at a minimum, see my aside bel
单值上下文中的多值ERROR
2014-07-29 14:22

回答 1 已采纳 Check out http://godoc.org/fmt#Println fmt.Println returns multiple values.. an int and and error
隐马尔可夫模型之python实战
2020-08-07 14:43

Andy_shenzl的博客隐马尔可夫模型之HMM基础隐马尔可夫模型之评估观察序列概率隐马尔可夫模型之学习算法隐马尔可夫模型之预测算法隐马尔可夫模型之python实战手写定义数据 Hidden_states = ("box 1", "box 2", "box 3") # 隐状态...
在第二个参数中使用DB :: raw（）时的不同值where（） laravel php
2015-09-05 18:41

回答 1 已采纳 I'm not a big fan of Laravel at all. I've got only small experience with this framework but i'm al
Linux非root用户安装了hmmer，但是装另外一个包（需要hmmer）的时候还是提示找不到hmmpress linux python
2022-05-14 14:44

回答 1 已采纳第一步操作的时候没有任何报错吗?先测试下python是否存在该包 python3 -m hmmer 如果没有的话就需要考虑路径问题或者进一步排查其他问题了或者尝试下直接安装是否可以正常进行下一步安
VC++程序DEBUG调试时触发一个断点 c++ 有问必答
2021-05-25 15:23

回答 4 已采纳主程序退出前，确保子线程已经退出。否则可能存在资源冲突，比如主线程退出了，资源释放了，但子线程还在使用该资源
hmmlearn源代码
2019-03-28 13:47

Grace_yanyanyan的博客 transmat_prior=transmat_prior, algorithm=algorithm, random_state=random_state, n_iter=n_iter, tol=tol, params=params, verbose=verbose, init_params=init_params) self.covariance_type = covariance_...
python中jieba.lcut切分词性标记的相关问题 python
2023-04-04 09:20

回答 10 已采纳该回答通过自己思路及引用到各个渠道搜索综合及思考,得到内容具体如下：出现错误的原因是在进行分词操作时，将分好的句子列表传入了jieba.posseg.lcut()函数中，这导致了错误的发生。因为ji
HMM所解决的三个方面问题
2021-05-12 15:01

逆夏11111的博客 1.介绍隐马尔可夫比马尔可夫多一个隐状态，隐状态可以已知也可以未知，未知隐状态可以通过观测序列获得最可能的隐状态。...2.训练学习问题，学习HMM模型，已知观测序列O，计算使该观测序列出现最大可
连续隐马尔可夫离散隐马尔科夫模型的MATLAB实现.zip_CHMM_DHHM_HMM_matlab_连续隐马尔
2022-07-15 14:00

隐马尔可夫连续和离散情况下的MATLAB实现
没有解决我的问题, 去提问

问题事件

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
创建了问题 7月18日

悬赏问题

¥15 如何在vue.config.js中读取到public文件夹下window.APP_CONFIG.API_BASE_URL的值
¥50 浦育平台scratch图形化编程
¥20 求这个的原理图只要原理图
¥15 vue2项目中，如何配置环境，可以在打完包之后修改请求的服务器地址
¥20 微信的店铺小程序如何修改背景图
¥15 UE5.1局部变量对蓝图不可见
¥15 一共有五道问题关于整数幂的运算还有房间号码还有网络密码的解答？(语言-python)
¥20 sentry如何捕获上传Android ndk 崩溃
¥15 在做logistic回归模型限制性立方条图时候，不能出完整图的困难
¥15 G0系列单片机HAL库中景园gc9307液晶驱动芯片无法使用硬件SPI+DMA驱动，如何解决？