OOP approach to objects with many flavours

I’ve been writing code to parse and extract information from messages sent by a bot. There are only a few different kinds of messages, but each one of them contains wildly different kinds of information I’m interested in, and I’m struggling with finding the best way to deal with them as objects in my code.

If I were using Haskell, I’d simply create a type Message and define a tailored constructor for each kind of message

data Message = Greeting Foo Bar | Warning Yadda Yadda Yadda | ...

It’s a very nice and clean way to both have them all under the same type and be able to tell the message kinds apart easily.

How would one go about designing object classes to that effect in a OOP-friendly (or better, pythonic) way? I’ve thought of two approaches, namely:

  • Defining a base-class Message and subclassing it for each kind of message. Pros: conceptually clean. Cons: lots of boilerplate code, and it doesn’t really make the code very readable or the relationship between different message classes clear.

  • Defining a universal class Message, which represents every message type. It will have an attribute .type to differentiate between message kinds, and its __init__ function will instantiate the attributes appropriate to the message type accordingly. Pros: Simple to code, practical. Cons: it seems a bad practice to have the class’ attributes be so unpredictable, and it generally feels wrong.

but I’m not completely satisfied with neither. While I realize that this is a just small programme, I’m using it as an opportunity to learn more about the use of abstractions and software architecture, I suppose. Can someone show me the way?

Answer

For a message class design, I’d use dataclasses to minimise the boilerplate. You get to focus entirely on the fields:

from dataclasses import dataclass

class Message:
    # common message methods

@dataclass
class Greeting(Message):
    foo: str
    bar: int

@dataclass
class Warning(Message):
    yadda: list[str]

There usually isn’t much more you need for a simple project. You could add a @classmethod factory to the Message base class to help generating specific message types, and Message could be a @dataclass itself too, if there are common attributes shared between the different types.

That said, once you start to factor in serialisation and deserialisation requirements, using a type field that is an enum can be helpful.

To illustrate that point: For a current RESTFul API project that includes automated OpenAPI 3.1 documentation, we are using Marshmallow to handle translation from and to JSON, marshmallow-dataclasses to avoid having to repeat ourselves to define the schema and validation, and marshmallow-oneofschema to reflect a polymorphic schema for a hierarchy of classes that differ by their type much like your Message example.

Using 3rd-party libraries then constrains your options, so I used metaprogramming (mostly class.__init_subclass__ and Generic type annotations) to make it possible to concisely define such a polymorphic type hierachy that’s keyed on an enum.

Your message type would be expressed like this:

class MessageType(enum.Enum):
    greeting = "greeting"
    warning = "warning"
    # ...

@dataclass
class _BaseMessage(PolymorphicType[MessageType]):
    type: MessageType
    # ...

@dataclass
class Greeting(_BaseMessage, type_key=MessageType.greeting):
    foo: str
    bar: int

@dataclass
class Warning(_BaseMessage, type_key=MessageType.warning):
    yadda: list[str]

MessageSchema = _BaseMessage.OneOfSchema("MessageSchema")

after which messages are loaded from JSON using MessageSchema.load(), producing a specific instance based on the "type" key in the dictionary, e.g.

message = MessageSchema.load({"type": "greeting", "foo": "spam", "bar": 42})
isinstance(message, Greeting)  # True

while MessageSchema.dump() gets you suitable JSON output regardless of the input type:

message = Warning([42, 117])
MessageSchema.dump(message)  # {"type": "warning", "yadda": [42, 117]}

It is the use of an enum here that makes the integration work best; PolymorphicType is the custom class that handles most of the heavy lifting to make the _BaseMessage.OneOfSchema() call at the end work. You don’t have to use metaprogramming to achieve that last part, but for us it reduced removed most of the marshmallow-oneschema boilerplate.

Plus, we get OpenAPI schemas that reflect each specific message type, which documentation tools like Redocly know how to process:

components:
  schemas:
    Message:
      oneOf:
        - $ref: '#/components/schemas/Greeting'
        - $ref: '#/components/schemas/Warning'
      discriminator:
        propertyName: type
        mapping:
          greeting: '#/components/schemas/Greeting'
          warning: '#/components/schemas/Warning'
    Greeting:
      type: object
      properties:
        type:
          type: string
          default: greeting
        foo:
          type: string
        bar:
          type: integer
    Warning:
      type: object
      properties:
        type:
          type: string
          default: warning
        yadda:
          type: array
          items:
            type: string