A frequent use case I have when using blocks involves creating some graph of bricks, (most of which recursive), and needing to fine tune only a few of the children bricks weights. The workflow suggested by the docs is to set the top level initialization and then call
push_initialization_config then manually go in and set the required weights. This seems like spreading out the configuration across multiple different locations in a code base as well as leaving no easy way to create prefab bricks with initializations already set inside them. In addition to this, the write over nature seems unintuitive, as there is no warning when a user manually sets weights_init and sees no effect. For example:
class MultiBrick(Initializable): def __init__(self, **kwargs): self.lin1 = Linear(weights_init=Constant(1)) self.lin2 = Linear() ...... b = MultiBrick(self, weights_init=Constant(0), bias=Constant(0)) b.initialize()
I would expect b.lin1.weights_init to be Constant(1) but this is not the case. In this case its easy enough to set it manually post
push_initialization_config but this is not true the larger the graphs get.
As a solution, could the
Initializable._push_initialization_config the first check if the
child.weights_init exists and if it does not overwrite and leave the original value? I quick look at the source says
Initialization is the only class using
Thanks for your consideration.