qq_34644971 2018-09-23 08:37 采纳率: 50%
浏览 2754
已采纳

一个在python句子里面的for循环

新手学python,我遇到一个python在句子里面,我不能理解这个for循环是什么意思。
如果改成常见的那种for循环应该怎么写。
我看不懂的代码是这一句

 Q[s, a] = np.sum([T[s, a, sp] * (R[s, a, sp] + discount_rate * np.max(Q_prev[sp]))    for sp in range(3)])

这是完整的代码,运行的环境是jupyter,python版本是3.6.5

 import numpy as np
nan = np.nan
T = np.array([[[0.7,0.3,0.0],[1.0,0.0,0.0],[0.8,0.2,0.0]],
              [[0.0,1.0,0.0],[nan,nan,nan],[0.0,0.0,1.0]],
              [[nan,nan,nan],[0.8,0.1,0.1],[nan,nan,nan]]
             ])
R = np.array([[[10.,0.0,0.0],[0.0,0.0,0.0],[0.0,0.0,0.0]],
              [[10.,0.0,0.0],[nan,nan,nan],[0.0,0.0,-50.0]],
              [[nan,nan,nan],[40.0,0.0,0.0],[nan,nan,nan]]
             ])
possible_actions = [[0,1,2],[0,2],[1]]

Q = np.full((3, 3), -np.inf)  # -inf 对应着不可能的动作 
for state, actions in enumerate(possible_actions):    
    Q[state, actions] = 0.0  # 对所有可能的动作初始化为0.0
learning_rate = 0.01 
discount_rate = 0.95 
n_iterations = 100
for iteration in range(n_iterations):   
    Q_prev = Q.copy()    
    for s in range(3):        
        for a in possible_actions[s]:            
            Q[s, a] = np.sum([T[s, a, sp] * (R[s, a, sp] + discount_rate * np.max(Q_prev[sp]))  
            for sp in range(3)])
  • 写回答

1条回答 默认 最新

  • cjk_cjk 2018-09-23 10:25
    关注

    这句话改成常见写法是:
    for sp in range(3):
    Q[s, a] += T[s, a, sp] * (R[s, a, sp] + discount_rate * np.max(Q_prev[sp]))

    举个例子:
    t = [i for i in range(3)]
    相当于:
    t = []
    for i in range(3):
    t.append(i)
    这是一种非常方便的写法。

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 docker环境配置
  • ¥20 绿盟安全扫描--检测到目标站点存在javascript框架库漏洞
  • ¥30 Android STD快速启动
  • ¥15 如何使用simulink建立一个永磁同步直线电机模型?
  • ¥30 天体光谱图的的绘制并得到星表
  • ¥15 PointNet++的onnx模型只能使用一次
  • ¥20 西南科技大学数字信号处理
  • ¥15 有两个非常“自以为是”烦人的问题急期待大家解决!
  • ¥30 STM32 INMP441无法读取数据
  • ¥15 R语言绘制密度图,一个密度曲线内fill不同颜色如何实现