weixin_39628247
2021-01-12 14:30 阅读 0

dump_only whitelist/inverse / load_only whilst including when dumping

I feel like this kind of touches on #375, but that didn't seem to align with the behaviour that I think many would find useful. Also highly related to #61

It would be useful to use the meta class (or constructor) to be able to specify a set of fields that are whitelisted for deserialisation. If the field doesn't exist in the iterable, it cannot be deserialised.

If a user of marshmallow had a large schema class, with many fields, but only wanted 1 of those fields to be deserializable/writable, from my understanding they could either: - name every field they want as read only in dump_only attribute (lots of fields, prone to errors, ie missing a field) - When loading data, specify only=('fieldname',) upon instantiating the schema. (prone to errors, forgetting to put this in when instantiating)

Am I missing something, or is this a shortcoming that could be addressed?

Also to note, the naming of load_only feels like a misnomer without reading the docs in detail - If you specify a field in load_only, it will only be available for loading, not dumping - but taking another perspective to this could imply that it's an iterable of fields that are whitelisted to be loaded, but can still be dumped.

该提问来源于开源项目:marshmallow-code/marshmallow

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享

9条回答 默认 最新

  • weixin_39727976 weixin_39727976 2021-01-12 14:30

    ... the naming of load_only feels like a misnomer ...

    I see what you mean with load_only. I assumed the wrong behavior until I looked at the docs.

    load_only: A list or tuple of fields to skip during serialization dump_only: A list or tuple of fields to skip during deserialization, read-only fields

    A good way to illustrate why this is a strange interface is to create a schema with a field that is both load_only and dump_only.

    
    class TestSchema(Schema):
        field1 = fields.Field()
        field2 = fields.Field()
    
        class Meta:
            load_only=('field1',)
            dump_only=('field1',)
    

    field1 will not be serialized nor deserialized in this example. load_only actually means "don't dump" and dump_only means "don't load".

    | | load | dump | | --- | --- | --- | | True | | | | False | dump_only | load_only |

    A clearer terminology might be "force" and "skip" which have clear connotations and no implicit behaviors.

    | | load | dump | | --- | --- | --- | | True | force_load | force_dump | | False | skip_load | skip_dump |

    The only ambiguity is priority, but I believe that in the rest of the library exclusion has priority over inclusion and instances have priority over classes.

    点赞 评论 复制链接分享
  • weixin_39628247 weixin_39628247 2021-01-12 14:30

    I see exactly what you're saying, and I understand the functionality of using load_only and dump_only.

    Whilst better names such as force_load, etc do improve the usability of the library, it still wouldn't expose API to whitelist fields from being loaded/dumped. (Unless specifying force_load or force_dump implies that any unlisted fields shouldn't be loaded or dumped)

    点赞 评论 复制链接分享
  • weixin_39727976 weixin_39727976 2021-01-12 14:30

    Unless specifying force_load or force_dump implies that any unlisted fields shouldn't be loaded or dumped.

    No, those options wouldn't solve the issue of whitelisting deserialized fields.

    The option currently used to whitelist a set of fields on a schema is fields, but it only applies to serialization, not deserialization, which seems a bit asymmetrical considering there is nothing about the name "fields" that implies serialization exclusively.

    a user of marshmallow had a large schema class, with many fields, but only wanted 1 of those fields to be deserializable

    It looks like partial loading (the partial option for Schema.load()) might be an existing solution. The "partial" white list has to be supplied on each load(), which doesn't address the concern of "forgetting to put [it] in", but the set of fields could be assigned to a variable for reuse.

     py
    deserializable_fields = ('fieldname',)
    ...
    data, errors = schema.load(partial=deserializable_fields)
    

    If the white list really needs to be declared on a schema class or instance, maybe a separate schema class should be declared.

    点赞 评论 复制链接分享
  • weixin_39955700 weixin_39955700 2021-01-12 14:30

    Interesting comment. I feel the same about the load_only + dump_only case.

    Looks like we're bearing the weight of history. Starting from scratch, we'd use better terms, like dont_load or skip_load or anything less ambiguous. But here we are.

    These are features so common that there must be a lot of those in the codes using Marshmallow, and the benefit for changing this is minimal. You get struck by this once, then you learn and you move on.

    Yet, is there any reason (apart from the time it takes to do it...) not to propose new keywords, and slowly deprecate the old ones?

    点赞 评论 复制链接分享
  • weixin_39955700 weixin_39955700 2021-01-12 14:30

    wrote:

    A clearer terminology might be "force" and "skip" which have clear connotations and no implicit behaviors.

    , do you think Marshmallow 3.0 could introduce skip_load/skip_dump and deprecate load_only/dump_only?

    点赞 评论 复制链接分享
  • weixin_39951773 weixin_39951773 2021-01-12 14:30

    What does "force" actually mean? If the field is not marked with "required", will "force" flag force to be required?

    I don't get why people are so concerned about naming and claim that it is used a lot. It should not. Most common - "password" is load_only, "id" - dump_only. If you have use cases that require you defining complex dependencies and load/dump schemas, you should really consider splitting them into a separate schemas (maybe with reusing common parts through base schema). That's it.

    PS I think load_only / dump_only semantics are OK.

    点赞 评论 复制链接分享
  • weixin_39955700 weixin_39955700 2021-01-12 14:30

    Just to clarify, there are two points at stake here:

    • the whitelist feature, which is what the OP is about,
    • the naming of load_only/dump_only (side note in the OP)

    Regarding the second point, there are two potentially surprising things about those names:

    As said in OP

    Also to note, the naming of load_only feels like a misnomer without reading the docs in detail - If you specify a field in load_only, it will only be available for loading, not dumping - but taking another perspective to this could imply that it's an iterable of fields that are whitelisted to be loaded, but can still be dumped.

    And as said

    A good way to illustrate why this is a strange interface is to create a schema with a field that is both load_only and dump_only

    I found this a bit surprising and so did my colleagues.

    As I said, it is no big deal. You get hit once, and then you understand and you know. It might not be worth the pain of changing load_only/dump_only in the whole library (and all the codes using it). But since it is a breaking change (mitigated by keeping old terms working although considered deprecated), the release of v3 would be a good time to do it.

    I don't mean to argue about this. I don't really mind, that was just a suggestion. And the "not worth the pain" argument is perfectly relevant.

    点赞 评论 复制链接分享
  • weixin_39951773 weixin_39951773 2021-01-12 14:30

    I agree with one thing: "load_only"/"dump_only" feilds in Meta class IS indeed confusing. It would be much easier to specify those flags directly in a field definition:

    python
    class UserSchema(m.Schema):
        id = mf.String(dump_only=True)
        email = mf.String(required=True)
        password = mf.String(load_only=True)
    

    So, in my understanding the Meta fields are mostly used when you're integrating with third-party schema system (e.g. SqlAlchemy), you want to automatically import all their stuff and then customize it further. So, I would say that this support should have never existed in the core library in the first place and let integration libraries handle all those customizations.

    点赞 评论 复制链接分享
  • weixin_39560245 weixin_39560245 2021-01-12 14:30

    I don't plan on adding a whitelist feature because it would be redundant and potentially confusing API. I think the OP's use case can be solved with a base class that uses the on_bind_field hook. Something like

    python
    from marshmallow import Schema, fields
    
    class ReadOnlySchema(Schema):
        """Schema that every field dump-only if it doesn't explicitly set load_only=True.
        """
        def on_bind_field(self, field_name, field_obj):
            # Any field that is not load-only will be set as dump-only
            if not field_obj.load_only:
                field_obj.dump_only = True
    
    
    class MySchema(ReadOnlySchema):
        id = fields.Str()
        password = fields.Str(load_only=True)
    
    schema = MySchema()
    
    assert schema.fields['id'].dump_only is True
    assert schema.fields['password'].load_only is True
    assert schema.load({'id': 'ignored', 'password': 'secret'}).data == {'password': 'secret'}
    assert schema.dump({'id': 'abc12', 'password': 'ignored'}).data == {'id': 'abc12'}
    

    As for the names dump_only and load_only--these make sense in the context of the dump_only and load_only field constructor parameters. does have a point that the class Meta options are used primarily by libraries that dynamically generate fields and therefore should be implemented in those libraries. That said, I think we'll keep those for the time being, if only to maintain compatibility.

    Also, I generally try to avoid parameter names that have a negative word in them, e.g. "skip", because double negatives (skip_load=False) add cognitive load for the user.

    Closing for now. We can reopen if further discussion is necessary.

    点赞 评论 复制链接分享

相关推荐