Advanced Python

Python is a simple language, but its base on dynamicity makes it to easily implement some magical behaviors. Although the cost at runtime is high, the magics is very convenient. Combined with the high-speed C++ and C code, the advanced Python features are powerful when used correctly.

Iterator

When processing a big amount of data, repetition is used everywhere. The data need to be kept somewhere, and that is why we use containers. We write code to access the elements one by one. This pattern is called iteration.

Python provides the iterator protocol for this pattern. Let us see a simple. Create a list holding 10 integer elements:

>>> # Build a list of numerical data:
>>> data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> type(data)  # Show the type: it's indeed a list.
<class 'list'>

Then iterate over all the elements in data through the for ... in ... loop:

>>> # This uses the iterator protocol.
>>> for v in data:
...     print(v)
...
0
1
2
3
4
5
6
7
8
9

Note

Python iteration is slow. High-performance code almost never loops in Python. However it is easy to write and helpful when debugging. A high-performance system commonly provides it as a supplement for a fast but hard-to-debug API.

Custom Iterator

The iterator protocol works closely with the for loop. A programmer can use it to change the behavior of the loop. In the following example, we create a class implementing the iterator protocol. It takes a sequence, and create a new value on the fly when returning the elements.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
class ListIterator:
    """
    This class takes an offset value for all the element in the container.
    """

    def __init__(self, data, offset=0):

        self.data = data  # The data container.
        self.it = None  # The current index in the container.
        self.offset = offset  # The offset value.

    # Return the iterator object itself.
    def __iter__(self):

        return self

    # Return the next element from the iterator.
    def __next__(self):

        if None is self.it:

            self.it = 0

        elif self.it >= len(self.data)-1:

            self.it = None

            # Raise the exception if there is no more elements to be iterated.
            raise StopIteration

        else:

            self.it += 1

        return self.data[self.it] + self.offset

To use it, we need to create the custom iterator from the existing data list:

>>> # Iterate all elements but return (100 + v) instead of v.
>>> list_iterator = ListIterator(data, offset=100)
>>> # Print the iterator object and review the type.
>>> print(list_iterator)
<__main__.ListIterator object at 0x10cfaebd0>

Take a look at its members. Look for __iter__ and __next__.

>>> print(dir(list_iterator))
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__',
'__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__',
'__init_subclass__',
'__iter__',
'__le__', '__lt__', '__module__', '__ne__', '__new__',
'__next__',
'__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__',
'__str__', '__subclasshook__', '__weakref__', 'data', 'it', 'offset']

Every time the construct needs the next element, ListIterator.__next__() is called. Let us see how it runs. In the for loop iterating over list_iterator, it does not return the value in the data list, but the value offset by 100.

>>> # The iterator offset the value in the list.
>>> for v in list_iterator:
...     print(v)
...
100
101
102
103
104
105
106
107
108
109

If hard-coding the offset value in the loop, you can do the same thing without the custom iterator. To put it in another way, we may say, the custom iterator parameterizes the offset value.

>>> # Hard-code the offset value in the loop.
>>> for v in data:
...     print(v + 100)
...
100
101
102
103
104
105
106
107
108
109

Built-In Iterator

Python provides a built-in iterator helper iter() so it is not really necessary to create a custom iterator class if you do not need a very special treatment. (The custom class ListIterator is not really special enough to warrant a custom class. We will cover this part later.)

>>> # Create an iterator object using the built-in iter() function.
>>> list_iterator2 = iter(data)
>>> # Check the type of the built-in iterator object:
>>> print(list_iterator2)
<list_iterator object at 0x10cfb2990>
>>> # The built-in iterator works the same as directly iterating the list.
>>> for v in list_iterator2:
...     print(v)
...
0
1
2
3
4
5
6
7
8
9

Comparison

Compare with the type of our custom iterator:

>>> print(list_iterator)
<__main__.ListIterator object at 0x10cfaebd0>

Take a look at its members of the built-in iterator. Look for __iter__ and __next__ like we did for the custom iterator.

>>> print(dir(list_iterator2))
['__class__', '__delattr__', '__dir__', '__doc__', '__eq__',
'__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__',
'__init_subclass__',
'__iter__',
'__le__', '__length_hint__', '__lt__', '__ne__', '__new__',
'__next__',
'__reduce__', '__reduce_ex__', '__repr__',
'__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__']

Comparison

Compare with the members of our custom iterator. The only difference is that the custom iterator has one additional member __dict__.

>>> print(dir(list_iterator))
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__',
'__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__',
'__init_subclass__',
'__iter__',
'__le__', '__lt__', '__module__', '__ne__', '__new__',
'__next__',
'__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__',
'__str__', '__subclasshook__', '__weakref__', 'data', 'it', 'offset']

Implicitly Created Iterator

The built-in iterator may also be created by calling container.__iter__() method on the container object (iter() simply does it for you):

>>> # Create an iterator object using the iterator protocol.
>>> list_iterator3 = data.__iter__()
>>> print(list_iterator3)
<list_iterator object at 0x10cfbab90>

Aided by container.__iter__(), most of the time we can directly use a container in the loop construct for ... in ..., because the construct knows about the iterator protocol.

Generator

Generator is an advanced use of the iterator protocol. It is a special way to make a function to return an iterator. It is a special iterator that is also known as a generator iterator. Let us see a simple example of the generator function.

def list_generator(input_data, offset=0):
    """
    Generator function to return the element in a sequence by adding the given
    offset value.
    """
    for v in input_data:
        # This is what makes it special.
        yield v + offset

What makes it special is the yield statement. A generator function uses the yield statement instead of the return statement. The yield statement does not return the object it takes, but a generator object that will return the object it takes.

The generator function can do the same thing as the custom iterator class ListIterator we made previously. Let us see how it works step by step.

The first step is to call the generator function and print its return.

>>> # Call the generator function and get the return.
>>> generator = list_generator(data, offset=100)
>>> # Print the return.  See it is not the value returned by "yield".
>>> print(generator)
<generator object list_generator at 0x10cf756d0>

The object returned by the generator function list_generator() is a generator object. The generator object is an iterator, and has the methods iterator.__iter__() and iterator.__next__(). (The members in line 10 – 11 are not in the built-in iterator. But just like the additional members in our ListIterator, these members do no harm either.)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
>>> print(dir(generator))
['__class__', '__del__', '__delattr__', '__dir__', '__doc__', '__eq__',
'__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__',
'__init_subclass__',
'__iter__',
'__le__', '__lt__', '__name__', '__ne__', '__new__',
'__next__',
'__qualname__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__',
'__sizeof__', '__str__', '__subclasshook__',
'close', 'gi_code', 'gi_frame', 'gi_running', 'gi_yieldfrom', 'send',
'throw']

The second step is to put the generator in a for loop. And we will see it works in the same way as the iterators we used earlier.

>>> # Iterate over the generator iterator.
>>> for v in generator:
...     print(v)
...
100
101
102
103
104
105
106
107
108
109

List Comprehension

By list comprehensions, we are talking about the construct:

[ ... for ... in ... ]

It is not an iterator but heavily depends on the iterator protocol.

Python borrowed the syntax of list comprehension from other languages, e.g., Haskell. List comprehensions also follow the iterator protocol. And that is why we are talking about it in this section.

The list comprehensions provide an elegant way to build a list in a one-liner. It makes code look cleaner when used wisely. For example, the for loop:

>>> for v in data:
...     print(v)
...
0
1
2
3
4
5
6
7
8
9

can be replaced by a one-liner:

>>> print("\n".join([str(v) for v in data]))
0
1
2
3
4
5
6
7
8
9

They make it easier to change the format of the print.

>>> print(", ".join([str(v) for v in data]))
0, 1, 2, 3, 4, 5, 6, 7, 8, 9

In the above examples, the code using the list comprehensions is more readable than the for loop version. That is usually the guideline for choosing between a list comprehension or a for loop: readability.

And back to our custom iterator example. The list comprehensions also provide a nicer way to do what we did with the class ListIterator.

>>> # The single line offset list.
>>> print([v+100 for v in data])
[100, 101, 102, 103, 104, 105, 106, 107, 108, 109]
>>> # The multi-line version of the offset list.
>>> print("\n".join([str(v+100) for v in data]))
100
101
102
103
104
105
106
107
108
109

Note

While list comprehensions are mostly a short-hand to the for loop, it may runs faster or slowers than the for loop. It depends on the complexity of your statement and the container.

Generator Expression

The list comprehensions pave the road for explaining what is a generator expression. The generator expression

( ... for ... in ... )

is a more convenient way to create a generator.

Note

Although a generator expression ( ... for ... in ... ) (using parentheses) looks like a list comprehension [ ... for ... in ... ] (using brackets), they are different things.

Take a look at its members and confirm that it also has __iter__ and __next__.

>>> # Use the generator expression to create a generator.
>>> generator2 = (v + 100 for v in data)  # Hard-code the offset.
>>> # See the type of the object.
>>> print(generator2)
<generator object <genexpr> at 0x10cfce1d0>

See what are in the object.

>>> print(dir(generator2))
['__class__', '__del__', '__delattr__', '__dir__', '__doc__', '__eq__',
'__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__',
'__init_subclass__',
'__iter__',
'__le__', '__lt__', '__name__', '__ne__', '__new__',
'__next__',
'__qualname__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__',
'__sizeof__', '__str__', '__subclasshook__',
'close', 'gi_code', 'gi_frame', 'gi_running', 'gi_yieldfrom', 'send',
'throw']

The generator iterator returned from the generator expression works just like the iterator shown before:

>>> for v in generator2:
...     print(v)
...
100
101
102
103
104
105
106
107
108
109

Because it is an expression, it can be used to replace the list comprehension in an expression. The one-liner that printed the data can have the brackets removed (turning from using a list comprehension to using a generator expression):

>>> print("\n".join(str(v + 100) for v in data))
100
101
102
103
104
105
106
107
108
109

Comparison

Compare with the list comprehension.

>>> print("\n".join( [ str(v+100) for v in data ] ))
100
101
102
103
104
105
106
107
108
109

While it is not obvious in this small test, the list-comprehension version uses more memory than the generator-expression version, because the list comprehension really creates a new list. The generator, on the other hand, is a small iterator object. It makes a big difference in memory consumption when the data object is large.

Stack Frame

Contents in the section

Frame Object

We can get the frame object of the current stack frame using inspect.currentframe():

>>> import inspect
>>> f = inspect.currentframe()

A frame object has the following attributes:

  • Namespace:

    • f_builtins: builtin namespace seen by this frame

    • f_globals: global namespace seen by this frame

    • f_locals: local namespace seen by this frame

  • Other:

    • f_back: next outer frame object (this frame’s caller)

    • f_code: code object being executed in this frame

    • f_lasti: index of last attempted instruction in bytecode

    • f_lineno: current line number in Python source code

Let us see it ourselves:

>>> print([k for k in dir(f) if not k.startswith('__')])
['clear', 'f_back', 'f_builtins', 'f_code', 'f_globals', 'f_lasti',
'f_lineno', 'f_locals', 'f_trace', 'f_trace_lines', 'f_trace_opcodes']

We can learn many things about the frame in the object. For example, take a look in the builtin namespace (f_builtins):

>>> print(f.f_builtins.keys())
dict_keys(['__name__', '__doc__', '__package__', '__loader__', '__spec__',
'__build_class__', '__import__', 'abs', 'all', 'any', 'ascii', 'bin',
'breakpoint', 'callable', 'chr', 'compile', 'delattr', 'dir', 'divmod',
'eval', 'exec', 'format', 'getattr', 'globals', 'hasattr', 'hash', 'hex',
'id', 'input', 'isinstance', 'issubclass', 'iter', 'len', 'locals', 'max',
'min', 'next', 'oct', 'ord', 'pow', 'print', 'repr', 'round', 'setattr',
'sorted', 'sum', 'vars', 'None', 'Ellipsis', 'NotImplemented', 'False',
'True', 'bool', 'memoryview', 'bytearray', 'bytes', 'classmethod', 'complex',
'dict', 'enumerate', 'filter', 'float', 'frozenset', 'property', 'int',
'list', 'map', 'object', 'range', 'reversed', 'set', 'slice', 'staticmethod',
'str', 'super', 'tuple', 'type', 'zip', '__debug__', 'BaseException',
'Exception', 'TypeError', 'StopAsyncIteration', 'StopIteration',
'GeneratorExit', 'SystemExit', 'KeyboardInterrupt', 'ImportError',
'ModuleNotFoundError', 'OSError', 'EnvironmentError', 'IOError', 'EOFError',
'RuntimeError', 'RecursionError', 'NotImplementedError', 'NameError',
'UnboundLocalError', 'AttributeError', 'SyntaxError', 'IndentationError',
'TabError', 'LookupError', 'IndexError', 'KeyError', 'ValueError',
'UnicodeError', 'UnicodeEncodeError', 'UnicodeDecodeError',
'UnicodeTranslateError', 'AssertionError', 'ArithmeticError',
'FloatingPointError', 'OverflowError', 'ZeroDivisionError', 'SystemError',
'ReferenceError', 'MemoryError', 'BufferError', 'Warning', 'UserWarning',
'DeprecationWarning', 'PendingDeprecationWarning', 'SyntaxWarning',
'RuntimeWarning', 'FutureWarning', 'ImportWarning', 'UnicodeWarning',
'BytesWarning', 'ResourceWarning', 'ConnectionError', 'BlockingIOError',
'BrokenPipeError', 'ChildProcessError', 'ConnectionAbortedError',
'ConnectionRefusedError', 'ConnectionResetError', 'FileExistsError',
'FileNotFoundError', 'IsADirectoryError', 'NotADirectoryError',
'InterruptedError', 'PermissionError', 'ProcessLookupError', 'TimeoutError',
'open', 'copyright', 'credits', 'license', 'help', '__IPYTHON__', 'display',
'get_ipython'])

The field f_code is a mysterious code object:

>>> print(f.f_code)
<code object <module> at 0x10d0d1810, file "<ipython-input-26-dac680851f0c>",
line 3>

Danger

Because a frame object holds everything a construct uses, after finishing using the frame object, make sure to break the reference to it:

>>> f.clear()
>>> del f

If we don’t do it, it may take long time for the interpreter to break the reference for you.

An example of using the frame object is to print the stack frame in a custom way:

Custom code for showing stack frame (showframe.py).
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
#!/usr/bin/env python3

import sys
import inspect

def main():
    for it, fi in enumerate(inspect.stack()):
        sys.stdout.write('frame #{}:\n  {}\n\n'.format(it, fi))

if __name__ == '__main__':
    main()
$ ./showframe.py
frame #0:
  FrameInfo(frame=<frame at 0x7f8d4c31fdc0, file './showframe.py', line 8, code main>,
  filename='./showframe.py', lineno=7, function='main',
  code_context=['    for it, fi in enumerate(inspect.stack()):\n'], index=0)

frame #1:
  FrameInfo(frame=<frame at 0x104762450, file './showframe.py', line 11, code <module>>,
  filename='./showframe.py', lineno=11, function='<module>',
  code_context=['    main()\n'], index=0)

Module Magic with meta_path

Python importlib allows high degree of freedom in customizing how modules are imported. Not a lot of people know the capabilities, and perhaps one of the most useful hidden gems is sys.meta_path.

Here I will use an example to show how to use sys.meta_path to customize module loading. I will use a module, onemod, locating in an alternate directory, altdir/, and ask Python to load it from the non-standard location.

Note

Before running the example, make a shallow copy of the list to back it up:

>>> # Bookkeeping code: keep the original meta_path.
>>> old_meta_path = sys.meta_path[:]

sys.meta_path defines a list of importlib.abc.MetaPathFinder objects for customizing the import process. Take a look at the contents in sys.meta_path:

>>> sys.meta_path = old_meta_path  # Reset the list.
>>> print(sys.meta_path)
[<class '_frozen_importlib.BuiltinImporter'>, <class
'_frozen_importlib.FrozenImporter'>, <class
'_frozen_importlib_external.PathFinder'>]

At this point, onemod cannot be imported, because the path altdir/ is not in sys.path:

>>> import onemod
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'onemod'

In normal Python code, you will be asked to modify sys.path to include the path altdir/ for correctly import onemod. Here we will use MetaPathFinder. Derive from the abstract base class (ABC) and override the find_spec() method to tell it to load the module onemod at the place we specify.

For our path finder to work, we need to properly set up a importlib.machinery.ModuleSpec and create a importlib.machinery.SourceFileLoader object:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
import os
import importlib.abc
import importlib.machinery

class MyMetaPathFinder(importlib.abc.MetaPathFinder):
    def find_spec(self, fullname, path, target=None):
        if fullname == 'onemod':
            print('DEBUG: fullname: {} , path: {} , target: {}'.format(
                fullname, path, target))
            fpath = os.path.abspath('altdir/onemod.py')
            loader = importlib.machinery.SourceFileLoader('onemod', fpath)
            return importlib.machinery.ModuleSpec(fullname, loader, origin=fpath)
        else:
            return None

Add an instance of MyMetaPathFinder in sys.meta_path:

>>> sys.meta_path = old_meta_path + [MyMetaPathFinder()]
>>> print(sys.meta_path)
[<class '_frozen_importlib.BuiltinImporter'>, <class
'_frozen_importlib.FrozenImporter'>, <class
'_frozen_importlib_external.PathFinder'>, <__main__.MyMetaPathFinder object
at 0x10117b850>]

With the meta path finder inserted, onemod can be imported:

>>> import onemod
DEBUG: fullname: onemod , path: None , target: None
>>> print("show content in onemod module:", onemod.content)
show content in onemod module: string in onemod

It limits the special loading scheme to the specific module onemod. To test, ask it to load a module that does not exist:

>>> import one_non_existing_module
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'one_non_existing_module'

See the module we loaded. Compare it with a “normal module”.

>>> import re
>>> print('onemod:', onemod)
onemod: <module 'onemod' (/Users/yungyuc/work/web/ynote/nsd/12advpy/code/altdir/onemod.py)>
>>> print('re:', re)
re: <module 're' from '/Users/yungyuc/hack/usr/opt39_210210/lib/python3.9/re.py'>

The module objects have an important field __spec__, which is the ModuleSpec we created:

>>> print('onemod.__spec__:', onemod.__spec__)
onemod.__spec__: ModuleSpec(name='onemod',
loader=<_frozen_importlib_external.SourceFileLoader object at 0x10117bd30>,
origin='/Users/yungyuc/work/web/ynote/nsd/12advpy/code/altdir/onemod.py')
>>> print('re.__spec__:', re.__spec__)
re.__spec__: ModuleSpec(name='re',
loader=<_frozen_importlib_external.SourceFileLoader object at 0x1010b4fa0>,
origin='/Users/yungyuc/hack/usr/opt39_210210/lib/python3.9/re.py')

Descriptor

Contents in the section

Python is very flexible in accessing attributes in an object. There are multiple ways to customize the access, and the descriptor protocol provides the most versatile API and allows us to route attribute access anywhere 1.

Naive Accessor

To show how descriptors work, make a naive accessor class (by following the descriptor protocol):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
class ClsAccessor:
    """Routing access to all instance attributes to the descriptor object."""
    def __init__(self, name):
        self._name = name
        self._val = None
    def __get__(self, obj, objtype):
        print('On object {} , retrieve: {}'.format(obj, self._name))
        return self._val
    def __set__(self, obj, val):
        print('On object {} , update: {}'.format(obj, self._name))
        self._val = val

Use the descriptor in a class:

class MyClass:
    x = ClsAccessor('x')

See the message printed while getting the attribute x:

>>> o = MyClass()
>>> print(o.x)
On object <__main__.MyClass object at 0x1011c02b0> , retrieve: x
None

Setting the attribute also shows a message:

>>> o.x = 10
On object <__main__.MyClass object at 0x1011c02b0> , update: x
>>> print(o.x)
On object <__main__.MyClass object at 0x1011c02b0> , retrieve: x
10

This naive descriptor has a problem. Because the attribute value is kept in the descriptor object, and the descriptor is kept in the class object, attributes of all instances of MyClass share the same value:

>>> o2 = MyClass()
>>> print(o2.x) # Already set, not None!
On object <__main__.MyClass object at 0x1011c02e0> , retrieve: x
10
>>> o2.x = 100 # Set the value on o2.
On object <__main__.MyClass object at 0x1011c02e0> , update: x
>>> print(o.x) # The value of o changes too.
On object <__main__.MyClass object at 0x1011c02b0> , retrieve: x
100

Keep Data on the Instance

Having all instances sharing the attribute value usually undesirable, but of course the descriptor protocol allows to bind the values to the instance. Let us change the accessor class a little bit:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
class InsAccessor:
    """Routing access to all instance attributes to alternate names on the instance."""
    def __init__(self, name):
        self._name = name
    def __get__(self, obj, objtype):
        print('On object {} , instance retrieve: {}'.format(obj, self._name))
        varname = '_acs' + self._name
        if not hasattr(obj, varname):
            setattr(obj, varname, None)
        return getattr(obj, varname)
    def __set__(self, obj, val):
        print('On object {} , instance update: {}'.format(obj, self._name))
        varname = '_acs' + self._name
        return setattr(obj, varname, val)

The key of preserving the value in the instance is in lines 7 and 13. We mangle the variable name and use it to add a reference on the instance. Now add the descriptor to a class:

class MyClass2:
    x = InsAccessor('x')

Create the first instance. The descriptor can correctly set and retrieved:

>>> mo = MyClass2()
>>> print(mo.x) # The value is uninitialized
On object <__main__.MyClass2 object at 0x101190250> , instance retrieve: x
None
>>> mo.x = 10
On object <__main__.MyClass2 object at 0x101190250> , instance update: x
>>> print(mo.x)
On object <__main__.MyClass2 object at 0x101190250> , instance retrieve: x
10

Create another instance. According to our implementation, what we did in the first instance is not seen in the second one:

>>> mo2 = MyClass2()
>>> print(mo2.x) # The value remains uninitialized
On object <__main__.MyClass2 object at 0x101190a90> , instance retrieve: x
None

Metaclass

Metaclasses is a mechanism to perform meta-programming in Python. That is, metaclasses change how the Python code works by writing Python code, but do not use a code generator.

Class is an Object

In Python, a class is also an object, which is of the type “type”. Let us observe this interesting fact. Make a class:

class ClassIsObject:
    pass

The class can be manipulated like a normal object:

>>> print(ClassIsObject) # Operate the class itself, not the instance of the class
<class '__main__.ClassIsObject'>

The class has its own namespace (__dict__):

>>> print(ClassIsObject.__dict__) # The namespace of the class, not of the instance
{'__module__': '__main__',
 '__dict__': <attribute '__dict__' of 'ClassIsObject' objects>,
 '__weakref__': <attribute '__weakref__' of 'ClassIsObject' objects>,
 '__doc__': None}

The class is an object as well as a type. A type is also an object:

>>> isinstance(ClassIsObject, object)
True
>>> isinstance(ClassIsObject, type)
True
>>> isinstance(type, object)
True

Customize Class Creation

Now we can discuss how to customize class creation using metaclasses, after knowing that the classes are just Python objects. We will continue to use the accessor example in Descriptor. In the previous example, the descriptor object needs to take an argument for its name:

x = InsAccessor('x')

I would like to lift that inconvenience. First, I create a new descriptor:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
class AutoAccessor:
    """Routing access to all instance attributes to alternate names on the instance."""
    def __init__(self):
        self.name = None
    def __get__(self, obj, objtype):
        print('On object {} , auto retrieve: {}'.format(obj, self.name))
        varname = '_acs' + self.name
        if not hasattr(obj, varname):
            setattr(obj, varname, None)
        return getattr(obj, varname)
    def __set__(self, obj, val):
        print('On object {} , auto update: {}'.format(obj, self.name))
        varname = '_acs' + self.name
        return setattr(obj, varname, val)

The new descriptor class AutoAccessor does not take the attribute name in the constructor. Then I create a corresponding metaclass:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
class AutoAccessorMeta(type):
    """Metaclass to use the new automatic descriptor."""
    def __new__(cls, name, bases, namespace):
        print('DEBUG before names:', name)
        print('DEBUG before bases:', bases)
        print('DEBUG before namespace:', namespace)
        for k, v in namespace.items():
            if isinstance(v, AutoAccessor):
                v.name = k
        # Create the class object for MyAutoClass.
        newcls = super(AutoAccessorMeta, cls).__new__(cls, name, bases, namespace)
        print('DEBUG after names:', name)
        print('DEBUG after bases:', bases)
        print('DEBUG after namespace:', namespace)
        return newcls

The metaclass AutoAccessorMeta assigns the correct attribute name. We will compare the effects of the metaclass by creating two classes. The first is to use the AutoAccessor without the metaclass:

>>> class MyAutoClassDefault(metaclass=type):
...     x = AutoAccessor()
...

The second is to use the metaclass. The metaclass scans the class namespace and assigns the attribute name to the corresponding descriptor:

>>> class MyAutoClass(metaclass=AutoAccessorMeta):
...     x = AutoAccessor()  # Note: no name is given.
...
DEBUG before names: MyAutoClass
DEBUG before bases: ()
DEBUG before namespace: {'__module__': '__main__',
'__qualname__': 'MyAutoClass',
'x': <__main__.AutoAccessor object at 0x10117bcd0>}
DEBUG after names: MyAutoClass
DEBUG after bases: ()
DEBUG after namespace: {'__module__': '__main__',
'__qualname__': 'MyAutoClass',
'x': <__main__.AutoAccessor object at 0x10117bcd0>}

Now we successfully upgrade the descriptor to avoid the explicit argument for the attribute name:

>>> ao = MyAutoClass()
>>> print(ao.x) # The value is uninitialized
On object <__main__.MyAutoClass object at 0x101190460> , auto retrieve: x
None
>>> ao.x = 10
On object <__main__.MyAutoClass object at 0x101190460> , auto update: x
>>> print(ao.x)
On object <__main__.MyAutoClass object at 0x101190460> , auto retrieve: x
10
>>> print(ao._acsx)
10

Abstract Base Class (ABC)

Python is object-oriented and supports inheritance. Most of the time we use a simple inheritance relation, and it works as expected.

Create two classes with a simple inheritance
>>> class MyBaseClass:
...     pass
...
>>> class MyDerivedClass(MyBaseClass):
...     pass
...
>>> base = MyBaseClass()
>>> derived = MyDerivedClass()

The instance “base” instantiated from MyBaseClass is an instance of MyBaseClass but not MyDerivedClass:

>>> print('base {} MyBaseClass'.format(
...     'is' if isinstance(base, MyBaseClass) else 'is not'))
base is MyBaseClass
>>> print('base {} MyDerivedClass'.format(
...     'is' if isinstance(base, MyDerivedClass) else 'is not'))
base is not MyDerivedClass

The instance “derived” instantiated from MyDerivedClass is an instance of both MyBaseClass and MyDerivedClass:

>>> print('derived {} MyBaseClass'.format(
...     'is' if isinstance(derived, MyBaseClass) else 'is not'))
derived is MyBaseClass
>>> print('derived {} MyDerivedClass'.format(
...     'is' if isinstance(derived, MyDerivedClass) else 'is not'))
derived is MyDerivedClass

Nothing surprises. But Python also supports Abstract Base Class (abc) and to turn it up side down.

Method Resolution Order (MRO)

When we need to use ABC, the inheritance is usually much more complex than what we just described and involves multiple inheritance. There needs to be a definite way to resolve the multiple inheritance and Python uses the “C3” algorithm 2. The description can also be found in the Python method resolution order (MRO) 3.

Let us see how MRO works with a single-level inheritance:

Example of single diamond inheritance.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
class A:
    def process(self):
        print('A process()')

class B(A):
    def process(self):
        print('B process()')
        super(B, self).process()

class C(A):
    def process(self):
        print('C process()')
        super(C, self).process()

class D(B, C):
    pass

In the above code, the inheritance relationship among the four classes form a “diamond”. The MRO is:

>>> print(D.__mro__)
(<class '__main__.D'>,
<class '__main__.B'>, <class '__main__.C'>,
<class '__main__.A'>, <class 'object'>)
>>> obj = D()
>>> obj.process()
B process()
C process()
A process()

If we change the order in the inheritance declaration:

class D(C, B):
    pass

the MRO changes accordingly:

>>> print(D.__mro__)
(<class '__main__.D'>,
<class '__main__.C'>, <class '__main__.B'>,
<class '__main__.A'>, <class 'object'>)

In a more complex inheritance relationship, there may not be a single diamond. The following example have 3 diamonds crossing multiple levels:

Example of multiple diamond inheritance.
1
2
3
4
5
6
7
O = object
class F(O): pass
class E(O): pass
class D(O): pass
class C(D, F): pass
class B(D, E): pass
class A(B, C): pass

The MRO of the complex inheritance is:

>>> print(A.__mro__)
(<class '__main__.A'>, <class '__main__.B'>, <class '__main__.C'>, <class '__main__.D'>,
<class '__main__.E'>, <class '__main__.F'>, <class 'object'>)
>>> print(B.__mro__)
(<class '__main__.B'>, <class '__main__.D'>, <class '__main__.E'>, <class 'object'>)
>>> print(C.__mro__)
(<class '__main__.C'>, <class '__main__.D'>, <class '__main__.F'>, <class 'object'>)
>>> print(D.__mro__)
(<class '__main__.D'>, <class 'object'>)
>>> print(E.__mro__)
(<class '__main__.E'>, <class 'object'>)
>>> print(F.__mro__)
(<class '__main__.F'>, <class 'object'>)

And an instance of A is an instance of all the classes based on the inheritance rule:

>>> a = A()
>>> print('a {} A'.format('is' if isinstance(a, A) else 'is not'))
a is A
>>> print('a {} B'.format('is' if isinstance(a, B) else 'is not'))
a is B
>>> print('a {} C'.format('is' if isinstance(a, C) else 'is not'))
a is C
>>> print('a {} D'.format('is' if isinstance(a, D) else 'is not'))
a is D
>>> print('a {} E'.format('is' if isinstance(a, E) else 'is not'))
a is E
>>> print('a {} F'.format('is' if isinstance(a, F) else 'is not'))
a is F

Note

In production code, we usually do not want to deal with inheritance that is so complex. If possible, try to avoid it through the system design.

Virtual Base Class

Python abstract base class (abc) provides the capabilities to overload isinstance() and issubclass(), and define abstract methods.

We can use abc.ABCMeta.register() method to ask a class MyABC that is not in a inheritance chain of another class A to be a “virtual” base class of the latter.

import abc

class MyABC(metaclass=abc.ABCMeta):
    pass

Warning

Python “virtual” base classes have nothing to do with the C++ virtual classes.

As we know, A is not a subclass of MyABC:

>>> print('A {} a subclass of MyABC'.format('is' if issubclass(A, MyABC) else 'is not'))
A is not a subclass of MyABC

But once we “registerMyABC to be a virtual base class of A, we will see A becomes subclass of MyABC:

>>> MyABC.register(A)
<class '__main__.A'>
>>> print('A {} a subclass of MyABC'.format('is' if issubclass(A, MyABC) else 'is not'))
A is a subclass of MyABC

Abstract Methods

Using abc, we can add abstract methods to an class (making it abstract).

class AbstractClass(metaclass=abc.ABCMeta):
    @abc.abstractmethod
    def process(self):
        pass

An abstract class cannot be instantiated:

>>> a = AbstractClass()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Can't instantiate abstract class AbstractClass with abstract method process

In a derived class, the abstract method needs to be overridden

class GoodConcreteClass(AbstractClass):
    def process(self):
        print('GoodConcreteClass process')

Then the good concrete class can run.

>>> g = GoodConcreteClass()
>>> g.process()
GoodConcreteClass process

If the abstract method is not overridden

class BadConcreteClass(AbstractClass):
    pass

the derived class cannot run.

>>> b = BadConcreteClass()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Can't instantiate abstract class BadConcreteClass with abstract method process

References

1

Descriptor HowTo Guide

2

K. Barrett, B. Cassels, P. Haahr, D. A. Moon, K. Playford, and P. T. Withington, “A monotonic superclass linearization for Dylan,” SIGPLAN Not., vol. 31, no. 10, pp. 69–82, Oct. 1996, doi: 10.1145/236338.236343. https://dl.acm.org/doi/10.1145/236338.236343.

3

The Python 2.3 Method Resolution Order, https://www.python.org/download/releases/2.3/mro/.