Advanced Python¶
Iterator¶
While processing data in memory, they are iterated one by one. Assume we have 10 elements in a list.
>>> data = list(range(10))
>>> print(data, type(data))
Python uses the iterator protocol to get one element a time:
class ListIterator:
def __init__(self, data, offset=0):
self.data = data
self.it = None
self.offset = offset
def __iter__(self):
return self
def __next__(self):
if None is self.it:
self.it = 0
elif self.it >= len(self.data)-1:
raise StopIteration
else:
self.it += 1
return self.data[self.it] + self.offset
list_iterator = ListIterator(data)
The for ... in ...
construct applies to the iterator object. Every time
the construct needs the next element, ListIterator.__next__()
is called:
>>> for i in list_iterator:
>>> print(i)
0
1
2
3
4
5
6
7
8
9
List comprehension:
>>> print([value+100 for value in data])
[100, 101, 102, 103, 104, 105, 106, 107, 108, 109]
>>> print(list_iterator)
<__main__.ListIterator object at 0x10cfaebd0>
>>> print(dir(list_iterator))
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__',
'__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__',
'__init_subclass__', '__iter__', '__le__', '__lt__', '__module__', '__ne__',
'__new__', '__next__', '__reduce__', '__reduce_ex__', '__repr__',
'__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__',
'data', 'it', 'offset']
Of course, you don’t really need to write your own ListIterator
for
iterating a list, because Python builds in an iterator already:
>>> list_iterator2 = iter(data)
>>> for i in list_iterator2:
>>> print(i)
0
1
2
3
4
5
6
7
8
9
>>> print(list_iterator2)
<list_iterator object at 0x10cfb2990>
>>> print(dir(list_iterator2))
['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__',
'__ge__', '__getattribute__', '__gt__', '__hash__', '__init__',
'__init_subclass__', '__iter__', '__le__', '__length_hint__', '__lt__',
'__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', '__repr__',
'__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__']
The built-in iterator is created by calling
container.__iter__()
method on the container object
(iter()
simply does it for you):
>>> list_iterator3 = data.__iter__()
>>> print(list_iterator3)
<list_iterator object at 0x10cfbab90>
And the for ... in ...
construct actually knows about the iterator
protocol:
>>> for i in data:
>>> print(i)
0
1
2
3
4
5
6
7
8
9
List Comprehension¶
List comprehension is the construct [... for ... in ...]
. Python borrowed
the syntax of list comprehension from other languages, e.g., Haskell, and it
follows the iterator protocol. It is very convenient.
For example, the above for
loop can be replaced by a one-liner:
>>> print("\n".join([str(i) for i in data]))
0
1
2
3
4
5
6
7
8
9
Generator¶
def list_generator(input_data):
for i in input_data:
yield i
>>> generator = list_generator(data)
>>> print(generator)
<generator object list_generator at 0x10cf756d0>
>>> print(dir(generator))
['__class__', '__del__', '__delattr__', '__dir__', '__doc__', '__eq__',
'__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__',
'__init_subclass__', '__iter__', '__le__', '__lt__', '__name__', '__ne__',
'__new__', '__next__', '__qualname__', '__reduce__', '__reduce_ex__',
'__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__',
'close', 'gi_code', 'gi_frame', 'gi_running', 'gi_yieldfrom', 'send',
'throw']
>>> for i in list_generator(data):
>>> print(i)
0
1
2
3
4
5
6
7
8
9
Generator Expression¶
A more convenient way of creating a generator is to use the generator
expression (... for ... in ...)
. Note this looks like the list
comprehension [... for ... in ...]
, but uses parentheses to replace the
brackets.
>>> generator2 = (i for i in data)
>>> print(generator2)
<generator object <genexpr> at 0x10cfce1d0>
>>> print(dir(generator2))
['__class__', '__del__', '__delattr__', '__dir__', '__doc__', '__eq__',
'__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__',
'__init_subclass__', '__iter__', '__le__', '__lt__', '__name__', '__ne__',
'__new__', '__next__', '__qualname__', '__reduce__', '__reduce_ex__',
'__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__',
'close', 'gi_code', 'gi_frame', 'gi_running', 'gi_yieldfrom', 'send',
'throw']
>>> for i in generator2:
>>> print(i)
0
1
2
3
4
5
6
7
8
9
By using the generator expression, the data printing one-liner can drop the brackets:
>>> print("\n".join(str(i) for i in data))
0
1
2
3
4
5
6
7
8
9
>>> # Compare with the the list comprehension:
>>> print("\n".join( [ str(i) for i in data ] ))
0
1
2
3
4
5
6
7
8
9
Python Stack Frame¶
(C)Python uses a stack-based interpreter. We are allowed to peek all the previous stack frames:
import traceback
def f1():
traceback.print_stack()
def f2():
f1()
def f3():
f2()
>>> f3()
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in f3
File "<stdin>", line 2, in f2
File "<stdin>", line 2, in f1
frame
object¶
We can get the frame
object of the current stack frame using
inspect.currentframe()
:
>>> import inspect
>>> f = inspect.currentframe()
A frame
object has the following attributes:
- Namespace:
*
f_builtins
: builtin namespace seen by this frame *f_globals
: global namespace seen by this frame *f_locals
: local namespace seen by this frame - Other:
*
f_back
: next outer frame object (this frame’s caller) *f_code
: code object being executed in this frame *f_lasti
: index of last attempted instruction in bytecode *f_lineno
: current line number in Python source code
>>> print([k for k in dir(f) if not k.startswith('__')])
['clear', 'f_back', 'f_builtins', 'f_code', 'f_globals', 'f_lasti',
'f_lineno', 'f_locals', 'f_trace', 'f_trace_lines', 'f_trace_opcodes']
We can learn many things about the frame in the object. For example, we can take a look in the builtin namespace:
>>> print(f.f_builtins.keys())
dict_keys(['__name__', '__doc__', '__package__', '__loader__', '__spec__',
'__build_class__', '__import__', 'abs', 'all', 'any', 'ascii', 'bin',
'breakpoint', 'callable', 'chr', 'compile', 'delattr', 'dir', 'divmod',
'eval', 'exec', 'format', 'getattr', 'globals', 'hasattr', 'hash', 'hex',
'id', 'input', 'isinstance', 'issubclass', 'iter', 'len', 'locals', 'max',
'min', 'next', 'oct', 'ord', 'pow', 'print', 'repr', 'round', 'setattr',
'sorted', 'sum', 'vars', 'None', 'Ellipsis', 'NotImplemented', 'False',
'True', 'bool', 'memoryview', 'bytearray', 'bytes', 'classmethod', 'complex',
'dict', 'enumerate', 'filter', 'float', 'frozenset', 'property', 'int',
'list', 'map', 'object', 'range', 'reversed', 'set', 'slice', 'staticmethod',
'str', 'super', 'tuple', 'type', 'zip', '__debug__', 'BaseException',
'Exception', 'TypeError', 'StopAsyncIteration', 'StopIteration',
'GeneratorExit', 'SystemExit', 'KeyboardInterrupt', 'ImportError',
'ModuleNotFoundError', 'OSError', 'EnvironmentError', 'IOError', 'EOFError',
'RuntimeError', 'RecursionError', 'NotImplementedError', 'NameError',
'UnboundLocalError', 'AttributeError', 'SyntaxError', 'IndentationError',
'TabError', 'LookupError', 'IndexError', 'KeyError', 'ValueError',
'UnicodeError', 'UnicodeEncodeError', 'UnicodeDecodeError',
'UnicodeTranslateError', 'AssertionError', 'ArithmeticError',
'FloatingPointError', 'OverflowError', 'ZeroDivisionError', 'SystemError',
'ReferenceError', 'MemoryError', 'BufferError', 'Warning', 'UserWarning',
'DeprecationWarning', 'PendingDeprecationWarning', 'SyntaxWarning',
'RuntimeWarning', 'FutureWarning', 'ImportWarning', 'UnicodeWarning',
'BytesWarning', 'ResourceWarning', 'ConnectionError', 'BlockingIOError',
'BrokenPipeError', 'ChildProcessError', 'ConnectionAbortedError',
'ConnectionRefusedError', 'ConnectionResetError', 'FileExistsError',
'FileNotFoundError', 'IsADirectoryError', 'NotADirectoryError',
'InterruptedError', 'PermissionError', 'ProcessLookupError', 'TimeoutError',
'open', 'copyright', 'credits', 'license', 'help', '__IPYTHON__', 'display',
'get_ipython'])
A mysterious code
object:
Because a frame
object holds everything a construct uses, after finishing
using the frame object, make sure to break the reference to it. If we don’t
do it, it may take long time for the interpreter to break the reference for
you.
An example script for showing stack frame:
1 2 3 4 5 6 7 8 9 10 11 | #!/usr/bin/env python3
import sys
import inspect
def main():
for it, fi in enumerate(inspect.stack()):
sys.stdout.write('frame #{}:\n {}\n\n'.format(it, fi))
if __name__ == '__main__':
main()
|
$ ./showframe.py
frame #0:
FrameInfo(frame=<frame at 0x7f8d4c31fdc0, file './showframe.py', line 8, code main>, filename='./showframe.py', lineno=7, function='main', code_context=[' for it, fi in enumerate(inspect.stack()):\n'], index=0)
frame #1:
FrameInfo(frame=<frame at 0x104762450, file './showframe.py', line 11, code <module>>, filename='./showframe.py', lineno=11, function='<module>', code_context=[' main()\n'], index=0)
Customizing Module Import with sys.meta_path
¶
Python importlib allows high degree of
freedom in customizing how modules are imported. Here I will use an example to
load a module, onemod
, locating in an alternate directory,
altdir/
, and ask Python to load it from the non-standard location.
>>> # Bookkeeping code: keep the original meta_path.
>>> old_meta_path = sys.meta_path[:]
importlib
provides many facilities. The theme in this example
is sys.meta_path
. It defines a list of
importlib.abc.MetaPathFinder
objects for customizing the
import process.
>>> sys.meta_path = old_meta_path
>>> print(sys.meta_path)
[<class '_frozen_importlib.BuiltinImporter'>, <class
'_frozen_importlib.FrozenImporter'>, <class
'_frozen_importlib_external.PathFinder'>]
At this point, the onemod
cannot be imported, because altdir/
is
not in sys.path
:
>>> import onemod
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'onemod'
In normal Python code, you will be asked to modify sys.path
to include altdir/
for correctly import onemod
. Here we will
use MetaPathFinder
. Subclass the abstract
base class (ABC) and override the
find_spec()
method, to tell it to
load the onemod
module at the place we specify.
For our path finder to work, we need to properly set up a
importlib.machinery.ModuleSpec
, and create a
importlib.machinery.SourceFileLoader
object for it.
1 2 3 4 5 6 7 8 9 10 11 12 13 | import os
import importlib.abc
import importlib.machinery
class MyMetaPathFinder(importlib.abc.MetaPathFinder):
def find_spec(self, fullname, path, target=None):
if fullname == 'onemod':
print('DEBUG: fullname: {} , path: {} , target: {}'.format(fullname, path, target))
fpath = os.path.abspath('altdir/onemod.py')
loader = importlib.machinery.SourceFileLoader('onemod', fpath)
return importlib.machinery.ModuleSpec(fullname, loader, origin=fpath)
else:
return None
|
>>> sys.meta_path = old_meta_path + [MyMetaPathFinder()]
>>> print(sys.meta_path)
[<class '_frozen_importlib.BuiltinImporter'>, <class
'_frozen_importlib.FrozenImporter'>, <class
'_frozen_importlib_external.PathFinder'>, <__main__.MyMetaPathFinder object
at 0x10117b850>]
With the meta path finder inserted, onemod
can be imported:
>>> import onemod
DEBUG: fullname: onemod , path: None , target: None
>>> print("show content in onemod module:", onemod.content)
show content in onemod module: string in onemod
It only deals with :py:`python:onemod`. To test, ask it to load a module that does not exist:
>>> import one_non_existing_module
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'one_non_existing_module'
See the module we loaded. Compare it with a “normal module”.
>>> import re
>>> print('onemod:', onemod)
onemod: <module 'onemod' (/Users/yungyuc/work/web/ynote/nsd/12advpy/code/altdir/onemod.py)>
>>> print('re:', re)
re: <module 're' from '/Users/yungyuc/hack/usr/opt39_210210/lib/python3.9/re.py'>
The module objects have an important field __spec__
, which is the
ModuleSpec
we created:
>>> print('onemod.__spec__:', onemod.__spec__)
onemod.__spec__: ModuleSpec(name='onemod',
loader=<_frozen_importlib_external.SourceFileLoader object at 0x10117bd30>,
origin='/Users/yungyuc/work/web/ynote/nsd/12advpy/code/altdir/onemod.py')
>>> print('re.__spec__:', re.__spec__)
re.__spec__: ModuleSpec(name='re',
loader=<_frozen_importlib_external.SourceFileLoader object at 0x1010b4fa0>,
origin='/Users/yungyuc/hack/usr/opt39_210210/lib/python3.9/re.py')
Descriptor¶
The descriptor protocol allows us to route attribute access to anywhere outside the class.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | class ClsAccessor:
"""Routing access to all instance attributes to the descriptor object."""
def __init__(self, name):
self._name = name
self._val = None
def __get__(self, obj, objtype):
print('On object {} , retrieve: {}'.format(obj, self._name))
return self._val
def __set__(self, obj, val):
print('On object {} , update: {}'.format(obj, self._name))
self._val = val
class MyClass:
x = ClsAccessor('x')
|
See the message printed while getting the attribute x
:
>>> o = MyClass()
>>> print(o.x)
On object <__main__.MyClass object at 0x1011c02b0> , retrieve: x
None
Setting the attribute also shows a message:
>>> o.x = 10
On object <__main__.MyClass object at 0x1011c02b0> , update: x
>>> print(o.x)
On object <__main__.MyClass object at 0x1011c02b0> , retrieve: x
10
Because the attribute value is kept in the descriptor, and the descriptor is
kept in the class
object, attributes of all instances of
MyClass
share the same value.
>>> o2 = MyClass()
>>> print(o2.x) # Not None!
On object <__main__.MyClass object at 0x1011c02e0> , retrieve: x
10
>>> o2.x = 100
On object <__main__.MyClass object at 0x1011c02e0> , update: x
>>> print(o.x)
On object <__main__.MyClass object at 0x1011c02b0> , retrieve: x
100
Keep Data on the Instance¶
Having all instances sharing the attribute value isn’t always desirable. Descriptor protocol allows to bind the values to the instance too.
class InsAccessor:
"""Routing access to all instance attributes to alternate names on the instance."""
def __init__(self, name):
self._name = name
def __get__(self, obj, objtype):
print('On object {} , instance retrieve: {}'.format(obj, self._name))
varname = '_acs' + self._name
if not hasattr(obj, varname):
setattr(obj, varname, None)
return getattr(obj, varname)
def __set__(self, obj, val):
print('On object {} , instance update: {}'.format(obj, self._name))
varname = '_acs' + self._name
return setattr(obj, varname, val)
class MyClass2:
x = InsAccessor('x')
Create an instance to test the descriptor.
>>> mo = MyClass2()
>>> print(mo.x)
On object <__main__.MyClass2 object at 0x101190250> , instance retrieve: x
None
>>> mo.x = 10
On object <__main__.MyClass2 object at 0x101190250> , instance update: x
>>> print(mo.x)
On object <__main__.MyClass2 object at 0x101190250> , instance retrieve: x
10
In a new instance, the value uses the initial value:
>>> mo2 = MyClass2()
>>> print(mo2.x)
On object <__main__.MyClass2 object at 0x101190a90> , instance retrieve: x
None
Metaclass¶
Python class is also an object.
class ClassIsObject:
pass
>>> print(ClassIsObject)
<class '__main__.ClassIsObject'>
>>> print(ClassIsObject.__dict__)
{'__module__': '__main__',
'__dict__': <attribute '__dict__' of 'ClassIsObject' objects>,
'__weakref__': <attribute '__weakref__' of 'ClassIsObject' objects>,
'__doc__': None}
>>> isinstance(ClassIsObject, object)
True
>>> isinstance(ClassIsObject, type)
True
>>> isinstance(type, object)
True
Metaclasses allow programmers to customize class creation.
class AutoAccessor:
"""Routing access to all instance attributes to alternate names on the instance."""
def __init__(self):
self.name = None
def __get__(self, obj, objtype):
print('On object {} , auto retrieve: {}'.format(obj, self.name))
varname = '_acs' + self.name
if not hasattr(obj, varname):
setattr(obj, varname, None)
return getattr(obj, varname)
def __set__(self, obj, val):
print('On object {} , auto update: {}'.format(obj, self.name))
varname = '_acs' + self.name
return setattr(obj, varname, val)
class AutoAccessorMeta(type):
def __new__(cls, name, bases, namespace):
print('DEBUG before names:', name)
print('DEBUG before bases:', bases)
print('DEBUG before namespace:', namespace)
for k, v in namespace.items():
if isinstance(v, AutoAccessor):
v.name = k
# Create the class object for MyAutoClass.
newcls = super(AutoAccessorMeta, cls).__new__(cls, name, bases, namespace)
print('DEBUG after names:', name)
print('DEBUG after bases:', bases)
print('DEBUG after namespace:', namespace)
return newcls
We will use the descriptor to test the metaclass. The new descriptor class
AutoAccessor
doesn’t take the attribute name in the constructor.
Instead, AutoAccessorMeta
assigns the correct attribute name.
>>> class MyAutoClassDefault(metaclass=type):
... x = AutoAccessor()
...
>>> class MyAutoClass(metaclass=AutoAccessorMeta):
... x = AutoAccessor() # Note: no name is given.
...
DEBUG before names: MyAutoClass
DEBUG before bases: ()
DEBUG before namespace: {'__module__': '__main__',
'__qualname__': 'MyAutoClass',
'x': <__main__.AutoAccessor object at 0x10117bcd0>}
DEBUG after names: MyAutoClass
DEBUG after bases: ()
DEBUG after namespace: {'__module__': '__main__',
'__qualname__': 'MyAutoClass',
'x': <__main__.AutoAccessor object at 0x10117bcd0>}
>>> ao = MyAutoClass()
>>> print(ao.x)
On object <__main__.MyAutoClass object at 0x101190460> , auto retrieve: x
None
>>> ao.x = 10
On object <__main__.MyAutoClass object at 0x101190460> , auto update: x
>>> print(ao.x)
On object <__main__.MyAutoClass object at 0x101190460> , auto retrieve: x
10
>>> print(ao._acsx)
10
Type Introspection and Abstract Base Class (abc)
¶
>>> class MyBaseClass:
... pass
...
>>> class MyDerivedClass(MyBaseClass):
... pass
...
>>> base = MyBaseClass()
>>> derived = MyDerivedClass()
>>> print('base {} MyBaseClass'.format('is' if isinstance(base, MyBaseClass) else 'is not'))
base is MyBaseClass
>>> print('base {} MyDerivedClass'.format('is' if isinstance(base, MyDerivedClass) else 'is not'))
base is not MyDerivedClass
>>> print('derived {} MyBaseClass'.format('is' if isinstance(derived, MyBaseClass) else 'is not'))
derived is MyBaseClass
>>> print('derived {} MyDerivedClass'.format('is' if isinstance(derived, MyDerivedClass) else 'is not'))
derived is MyDerivedClass
Method Resolution Order (mro)¶
Python uses the “C3” algorithm to determine the [method resolution order (MRO)](https://www.python.org/download/releases/2.3/mro/) [1].
class A:
def process(self):
print('A process()')
class B(A):
def process(self):
print('B process()')
super(B, self).process()
class C(A):
def process(self):
print('C process()')
super(C, self).process()
class D(B, C):
pass
>>> print(D.__mro__)
(<class '__main__.D'>, <class '__main__.B'>, <class '__main__.C'>, <class '__main__.A'>, <class 'object'>)
>>> obj = D()
>>> obj.process()
B process()
C process()
A process()
Change the order in the inheritance declaration and the MRO changes accordingly.
>>> class D(C, B):
... pass
...
>>> print(D.__mro__)
(<class '__main__.D'>, <class '__main__.C'>, <class '__main__.B'>, <class '__main__.A'>, <class 'object'>)
Example: Multiple-Level Inheritance¶
O = object
class F(O): pass
class E(O): pass
class D(O): pass
class C(D, F): pass
class B(D, E): pass
class A(B, C): pass
>>> print(A.__mro__)
(<class '__main__.A'>, <class '__main__.B'>, <class '__main__.C'>, <class '__main__.D'>, <class '__main__.E'>, <class '__main__.F'>, <class 'object'>)
>>> print(B.__mro__)
(<class '__main__.B'>, <class '__main__.D'>, <class '__main__.E'>, <class 'object'>)
>>> print(C.__mro__)
(<class '__main__.C'>, <class '__main__.D'>, <class '__main__.F'>, <class 'object'>)
>>> print(D.__mro__)
(<class '__main__.D'>, <class 'object'>)
>>> print(E.__mro__)
(<class '__main__.E'>, <class 'object'>)
>>> print(F.__mro__)
(<class '__main__.F'>, <class 'object'>)
>>> a = A()
>>> print('a {} A'.format('is' if isinstance(a, A) else 'is not'))
a is A
>>> print('a {} B'.format('is' if isinstance(a, B) else 'is not'))
a is B
>>> print('a {} C'.format('is' if isinstance(a, C) else 'is not'))
a is C
>>> print('a {} D'.format('is' if isinstance(a, D) else 'is not'))
a is D
>>> print('a {} E'.format('is' if isinstance(a, E) else 'is not'))
a is E
>>> print('a {} F'.format('is' if isinstance(a, F) else 'is not'))
a is F
Abstract Base Class (abc)
¶
Python abstract base class (abc) provides the
capabilities to overload isinstance()
and
issubclass()
, and define abstract methods.
We can use abc.ABCMeta.register()
method to ask a class
MyABC
that is not in a inheritance chain of another class
A
to be a “virtual” base class of the latter.
import abc
class MyABC(metaclass=abc.ABCMeta):
pass
As we know, A
is not a subclass of MyABC
:
>>> print('A {} a subclass of MyABC'.format('is' if issubclass(A, MyABC) else 'is not'))
A is not a subclass of MyABC
But once we “register
” MyABC
to be
a virtual base class of A
, we will see A
becomes
subclass of MyABC
:
>>> MyABC.register(A)
<class '__main__.A'>
>>>
>>> print('A {} a subclass of MyABC'.format('is' if issubclass(A, MyABC) else 'is not'))
A is a subclass of MyABC
Abstract Method¶
Using abc
, we can add abstract methods to an class (making it
abstract).
class AbstractClass(metaclass=abc.ABCMeta):
@abc.abstractmethod
def process(self):
pass
An abstract class cannot be instantiated:
>>> a = AbstractClass()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Can't instantiate abstract class AbstractClass with abstract method process
In a derived class, the abstract method needs to be overridden
class GoodConcreteClass(AbstractClass):
def process(self):
print('GoodConcreteClass process')
Then the good concrete class can run.
>>> g = GoodConcreteClass()
>>> g.process()
GoodConcreteClass process
If the abstract method is not overridden
class BadConcreteClass(AbstractClass):
pass
the derived class cannot run.
>>> b = BadConcreteClass()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Can't instantiate abstract class BadConcreteClass with abstract method process
References¶
[1] | K. Barrett, B. Cassels, P. Haahr, D. A. Moon, K. Playford, and P. T. Withington, “A monotonic superclass linearization for Dylan,” SIGPLAN Not., vol. 31, no. 10, pp. 69–82, Oct. 1996, doi: 10.1145/236338.236343. https://dl.acm.org/doi/10.1145/236338.236343. |