weixin_39843738
weixin_39843738
2020-12-09 00:23

Some unprofessional questions in my mind

Hello, I am a college student and I have seen your paper also run your code. At present I have some questions, may not be professional, but hope to get your answers. 1. What is the difference between "launch supervised learning for policy estimation" and "launch policy gradient using network parameter just obtained" ? Command generated by something like pg_su_net_file_0.pkl and pg_re_10.pkl is saved something? What is pg_su and pg_re? 2. What are the parameters of simu_len and num_ex in the instruction? 3. When I run the first instruction "python launcher.py --exp_type = pg_su --simu_len = 50 --num_ex = 1000 --ofile = data / pg_su --out_freq = 10". My computer have 8G memory, but it can only run 8 epochs, then the computer will report the lack of memory. I find you in another question and answer said change -- num_ex = 10, I tried .It also can only run 640 epochs, the gap between 640 and 10000 is too large, What did I do wrong? Will my results be different with your paper results when I use --num_ex = 10? 4. What is the difference and contact between each epoch in code and iteration, jobset, episode in paper. I'm confused with these concepts. 5. In your paper writes “a fully connected hidden layer with 20 neurons, and a total of 89451 parameters”. Are you mean your neural network only has a hidden layer, and this hidden layer has a total of 20 neurons? Why there are 89451 parameters?

Thanks for your answer. :)

由于我猜测您是中国人,所以我也用中文问一遍,我的英语可能表达的不好。 你好,我是一名大学生,我看了您的论文也运行了您的代码。目前我有一些疑问,可能不太专业,但希望得到您的解答。 1. 你在example中有三个指令,其中"launch supervised learning for policy estimation"与"launch policy gradient using network parameter just obtained"的区别是什么?命令生成的类似pg_su_net_file_0.pkl与pg_re_10.pkl是保存的什么东西?pg_su与pg_re是什么的缩写? 2. 指令中的simu_len 与num_ex分别对应着论文中的什么参数? 3. 第一条指令python launcher.py --exp_type=pg_su --simu_len=50 --num_ex=1000 --ofile=data/pg_su --out_freq=10 。我的电脑8G内存,只能运行8个epoch,接着电脑就会报内存不足。我看您在另外一篇问答中说道换为--num_ex=10,我试了下可以跑640个epoch,但是与10000个差距太大,是我什么地方做错了吗?使用--num_ex=10会导致我的结果和您的论文结果有什么不同? 4. 每个epoch 与论文中的iteration,jobset,episode有什么区别与联系,这几个我混淆了。 5. 论文中提到的a fully connected hidden layer with 20 neurons, and a total of 89; 451 parameters.这是说明您的神经网络只有一层隐藏层,这一层上共有20个神经元的意思吗?为什么会有89451个参数呢?

感谢您的回答。:)

该提问来源于开源项目:hongzimao/deeprm

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

4条回答

  • weixin_39843738 weixin_39843738 5月前

    Hello. I'm sorry to disturb you again after so long. When I try to use the parameters defined in your paper, I get the final result. My results are different from your results in the paper. As shown below, after 1000 epochs, my Total reward stops at -100, and slowndown stops at 2.7.

    my result

    result

    your result(Figure6 in paper)

    your result

    I haven't changed any other code. Why is my result worse? I guess the reason may have the following: - I did not use supervised learning and I started to reinforce learning directly. At the same time I would like to ask whether the use of supervised learning (imitation learning) will only speed up the convergence without increasing the effect. - Theano version is different (I use 1.0.1) - Different parameter settings. My parameter.py reference 4.1 Deeprm in the paper, my main settings are as follows:

    
    self.num_epochs = 1000         # number of training epochs
    self.simu_len = 50             # length of the busy cycle that repeats itself 
    self.num_ex = 100              # number of sequences 
    
    self.output_freq = 100         # interval for output and store parameters 
    self.num_seq_per_batch = 20    # number of sequences to compute baseline 
    self.episode_max_length = 1000 # enforcing an artificial terminal 
    
    self.num_res = 2               # number of resources in the system 
    self.num_nw = 10               # maximum allowed number of work in the queue 
    
    self.time_horizon = 20         # number of time steps in the graph
    self.max_job_len = 15          # maximum duration of new jobs 
    self.res_slot = 10             # maximum number of available resource slots 
    self.max_job_size = 10         # maximum resource request of new work 
    
    self.backlog_size = 60         # backlog queue size 
    
    self.max_track_since_new = 10  # track how many time steps since last new jobs 
    
    self.job_num_cap = 40          # maximum number of distinct colors in current work graph 
    
    self.new_job_rate = 0.7        # lambda in new job arrival Poisson Process 
    
    self.discount = 1   
    

    Hope to get your help. Thank you.

    点赞 评论 复制链接分享
  • weixin_39797393 weixin_39797393 5月前

    Thanks for your interest!

    1. It's for supervised learning and reinforcement learning. The .pkl files are pickle files for saved neural network parameters. pg_su stands for policy_gradient_trained_with_supervised_learning, and pg_re means policy_gradient_with_reinforcement_learning.

    2. They have their literal meanings. simu_len=50 is for the "50t" in "... each experiment lasts for 50t ..." in page 5 of the paper. num_ex=100 means number of experiments to run.

    3. Yeah they can take a bit of time. Try reducing num_ex and you should be able to get a quick feeling of the training experience.

    4. Epoch is the training iteration. Jobset is defined in paper section 3.3 "To train a policy that generalizes, we consider multiple examples of job arrival sequences during training, henceforth called jobsets." Episode is introduced in the paper section 3.3 too, "In each episode, a fixed number of jobs arrive and are scheduled based on the policy, as described in §3.2. The episode terminates when all jobs finish executing.".

    5. Each of the 20 neurons is fully connected to all element in the input and output. You can also get the number of parameters by looking into the saved neural network model.

    Hope this helps!

    点赞 评论 复制链接分享
  • weixin_39843738 weixin_39843738 5月前

    Thank you for your reply, I benefited, but I still have some questions: 1. policy_gradient is a reinforcement_learning method. Why does policy_gradient_trained_with_supervised_learning appear? I haven't found any content about supervised learning in your paper. I hope you can describe the process of policy_gradient_trained_with_supervised_learning. In Supervised learning, what is the data? What is the label? 2. in my last question 2, "num_ex=100 means number of you mentioned experiments to run". Can I be understood as jobsets = num_ex = 100, episode = N = 20, jobset in each episode = jobsets / episode = 100/20 =5 (i.e., there are 5 jobs in each episode)?

    3., you said, "Simu_len =50 means 50t."". If it is the scene above (that is, there are 5 jobs in each episode), because the agent execute more than one action in each timestep 50t, then we will have about 45t (5t used in 50t) is superfluous?

    1. In the generated pg_re_xxx.pkl_slowdown_fig.pdf, what is the physical meaning of the ordinate CDF? What is the acronym? I did not see a similar image in your paper.

    2. I'm more familiar with tensorflow than theano. I would like to try repeating your experiment on tensorflow. Do you have any suggestions?

    点赞 评论 复制链接分享
  • weixin_39797393 weixin_39797393 5月前
    1. It's for imitation learning. It was not mentioned in the paper. The code is just something we left over during our development. It's still there for people who are interested. For your question, the data would be scheduling decision from existing algorithms, the "label" would be their actions.

    2. I don't think these terms can be mixed up like this. They are kind of independent with each other.

    3. Actions are taken continuously, not confined to just that number 5. (5 in the example above, I think, is the size of action space)

    4. CDF is cumulative distribution function https://en.wikipedia.org/wiki/Cumulative_distribution_function

    5. If you are interested, we have a more recent work using this line of approach, but on a different application with video streaming. It's implemented with Tensorflow. Please check with our project website to see paper, slides and open sourced code: http://web.mit.edu/pensieve/

    点赞 评论 复制链接分享

相关推荐