Victor Stinner’s Notes Documentation Release 1.0 Victor Stinner November 21, 2014

Transcription

Victor Stinner’s Notes Documentation Release 1.0 Victor Stinner November 21, 2014
Victor Stinner’s Notes Documentation
Release 1.0
Victor Stinner
November 21, 2014
Contents
1
2
3
4
5
My Projects
1.1 Websites and public profiles
1.2 Python Projects . . . . . . .
1.3 Other Projects . . . . . . .
1.4 Conferences . . . . . . . .
1.5 Documentation . . . . . . .
1.6 Articles on linuxfr.org . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
3
4
5
5
6
7
Old Projects (2000-2009)
2.1 Websites . . . . . .
2.2 Conferences . . . .
2.3 Paper Articles . . .
2.4 Python projects . . .
2.5 Other projects . . .
2.6 Workshops Lolut . .
2.7 “Childhood” . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9
9
9
10
11
11
11
11
Victor Stinner
3.1 Contact me . . . . . . . . . .
3.2 My public projects . . . . . .
3.3 Other people talking about me
3.4 About me . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
13
13
13
14
14
My Contributions to Free Softwares
4.1 Contributions to Python . . . .
4.2 Old Work (2004-2008) . . . . .
4.3 INL/EdenWall . . . . . . . . .
4.4 Fuzzing . . . . . . . . . . . . .
4.5 Bug reports . . . . . . . . . . .
4.6 Other . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
15
15
16
17
18
18
19
The Python programming language
5.1 Python 2 or Python 3? . . . . . . . . . . . .
5.2 Port Python 2 code to Python 3 . . . . . . .
5.3 Python packaging . . . . . . . . . . . . . .
5.4 Compile Python extensions on Windows . .
5.5 Build a Python Wheel package on Windows
5.6 Python 3 is better than Python 2 . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
21
21
21
22
22
22
23
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
i
5.7
5.8
5.9
History of Python releases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
History of the Python language (syntax) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Python for PHP developers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
25
26
6
Unicode
6.1 Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
27
27
7
The new Python asyncio module aka “tulip”
7.1 asyncio projects . . . . . . . . . . . . .
7.2 asyncio event loops . . . . . . . . . . . .
7.3 Talks about asyncio . . . . . . . . . . .
7.4 Tulip projects . . . . . . . . . . . . . . .
7.5 Low-level libraries . . . . . . . . . . . .
7.6 Coroutines . . . . . . . . . . . . . . . .
7.7 High-level libraries . . . . . . . . . . . .
7.8 Python builtin modules . . . . . . . . . .
7.9 Concurrency . . . . . . . . . . . . . . .
7.10 Issues with eventlet . . . . . . . . . . . .
8
9
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
31
31
31
32
32
33
33
34
34
34
34
OpenStack
8.1 Hack OpenStack . . . . . . . . . . . . . . . . . .
8.2 Tests . . . . . . . . . . . . . . . . . . . . . . . .
8.3 testr: list skipped tests . . . . . . . . . . . . . . .
8.4 tox/testr: “db type could not be determined” error .
8.5 Re-run a single failing test . . . . . . . . . . . . .
8.6 Test issues . . . . . . . . . . . . . . . . . . . . .
8.7 Trollius in OpenStack . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
35
35
36
36
36
37
37
38
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Time
41
10 Fragmentation of the Heap Memory
11 Python Memory
11.1 TODO . . . . . . . . . . .
11.2 Benchmarks . . . . . . . .
11.3 Memory Fragmentation . .
11.4 Python Memory Allocators
11.5 Windows . . . . . . . . . .
11.6 pymalloc . . . . . . . . . .
11.7 Linux . . . . . . . . . . . .
45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
47
47
47
47
47
48
48
48
12 Faster CPython
12.1 Ideas . . . . . . . . . . . . . . . . . . . .
12.2 Plan . . . . . . . . . . . . . . . . . . . . .
12.3 Status . . . . . . . . . . . . . . . . . . . .
12.4 Why Python is slow? . . . . . . . . . . . .
12.5 Optimizations . . . . . . . . . . . . . . .
12.6 Learn types . . . . . . . . . . . . . . . . .
12.7 Emit machine code . . . . . . . . . . . . .
12.8 Test if the specialized function can be used
12.9 Kill the GIL? . . . . . . . . . . . . . . . .
12.10 Links . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
49
49
49
49
50
51
54
54
56
57
57
13 Microbenchmarks
ii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
61
13.1 Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.2 Help compiler to optimize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.3 Aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14 Unsorted Notes
14.1 vim for developer . . . . . . . . . . . . . . . . . . .
14.2 Windows console . . . . . . . . . . . . . . . . . . .
14.3 Fedora . . . . . . . . . . . . . . . . . . . . . . . .
14.4 Python . . . . . . . . . . . . . . . . . . . . . . . .
14.5 libdbus . . . . . . . . . . . . . . . . . . . . . . . .
14.6 Code search . . . . . . . . . . . . . . . . . . . . .
14.7 Git remote branches . . . . . . . . . . . . . . . . .
14.8 OpenStreetMap . . . . . . . . . . . . . . . . . . . .
14.9 Shell script . . . . . . . . . . . . . . . . . . . . . .
14.10 Python zero copy . . . . . . . . . . . . . . . . . . .
14.11 Ftrace . . . . . . . . . . . . . . . . . . . . . . . . .
14.12 MySQL . . . . . . . . . . . . . . . . . . . . . . . .
14.13 Mercurial: find tags containing a specific changeset .
14.14 Misc . . . . . . . . . . . . . . . . . . . . . . . . .
15 Indices and tables
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
61
61
61
63
63
64
64
64
64
65
65
65
66
66
66
66
66
67
69
iii
iv
Victor Stinner’s Notes Documentation, Release 1.0
Contents:
Contents
1
Victor Stinner’s Notes Documentation, Release 1.0
2
Contents
CHAPTER 1
My Projects
If you like my projects, you can donate through the Gratipay website
.
See also my old projects.
Note: There is also the prime4commit.com website which supports the Python project: “Donate primecoins to open
source projects or make commits and get tips for it”. I don’t know what primecoins are. It looks like tip4commit
which uses bitcoins.
1.1 Websites and public profiles
Websites:
• Haypo’s Notes (this site, created in 2014)
• Wiki (last new article in december 2013)
• Blog Haypo (last article in 2011)
• Wordpress Blog (last article in 2011)
Public profiles:
• Twitter account
3
Victor Stinner’s Notes Documentation, Release 1.0
• Bitbucket profile
• Github profile
• Flickr profile (photos)
• Google+ profile
Code statistics:
• My profile on Ohloh: statistics on programming
• My Open Source Report Card (haypo): it looks to be based on github
• Stackalytics: Victor Stinner activity report (OpenStack)
1.2 Python Projects
Active projects:
• Trollius: portage of the Tulip project (asyncio module, PEP 3156) on Python 2, asynchronous input/output
library.
• faulthandler: Dump Python tracebacks explicitly, on a fault, after a timeout, or on a user signal. The module is
part of Python 3.3 and newer.
• pytracemalloc: debug tool to trace memory blocks allocated by Python. The module is part of Python standard
library since Python 3.4, I maintain a backport for Python 2.7 and 3.3 (it should work on Python 2.6-3.3).
• python-ptrace: Python binding of ptrace library to debug processes on UNIX and BSD.
• astoptimizer: experimental optimizer for Python code working on the Abstract Syntax Tree (AST, high-level
representration). It does as much work as possible at compile time.
• pyfailmalloc: Debug tool for Python injecting memory allocation faults to simulate a low memory system to test
how your application handles MemoryError exceptions.
• registervm: My fork of Python 3.3 using register-based bytecode, instead of stack-code bytecode. Read REGISTERVM.txt
4
Chapter 1. My Projects
Victor Stinner’s Notes Documentation, Release 1.0
1.3 Other Projects
• Hachoir: Python library that allows to view and edit a binary stream field by field. In other words, Hachoir
allows you to “browse” any binary stream just like you browse directories and files. A file is split in a tree of
fields, where the smallest field is just one bit.
• Hasard: pseudo-random number generator (PRNG) library.
• “misc” repository:
– my “dot” files, configuration files: bashrc, hgrc, gitconfig, etc.
– some command line program: apply_patch.py, scm.py
– some Python scripts
– some shell scripts: apt_get.sh, fedora_new_install.sh
1.4 Conferences
Conferences at github:
• 2014, Pycon FR at Lyon (France):
– Exploration de la boucle d’événements asyncio
– slides (PDF)
– slides at SpeackerDeck
1.3. Other Projects
5
Victor Stinner’s Notes Documentation, Release 1.0
• 2014, Pycon “US” at Montréal (Canada):
– Track memory leaks in Python
– slides (PDF)
– slides at SpeakerDeck
– video
• 2013, Pycon FR at Strasbourg (France): “Traquer les fuites mémoires Python”
– slides (PDF)
– slides at slideshare
– video
• 2013, FOSDEM at Bruxelle (Belgium): “Two projects to optimize Python” (astoptimizer, register-based bytecode)
– slides (PDF)
– slides at slideshare
• 2012, Pycon FR at Paris (France):
– Processus de développement de CPython
– Nouveautés de Python 3.3
• 2011, Pycon US at Atlanta (USA):
– https://github.com/haypo/conf/tree/master/2011-PyconUS-Atlanta
– video
– video
• 2011, Pycon FR at Rennes (France):
– “Développement de CPython”:
* slides (PDF)
* slides at slideshare
– Python : langage homogène, explicite et efficace
1.5 Documentation
• Programming with Unicode. Source code: unicode_book at github.
• CPython Internals. Source code: cpython_internals at github.
6
Chapter 1. My Projects
Victor Stinner’s Notes Documentation, Release 1.0
1.6 Articles on linuxfr.org
Some of my articles:
• Python 3.4 est sorti avec 7 nouveaux modules (2014/03)
• Justice Free publie enfin ses patchs sur les logiciels libres (2011/09)
• Python 3.2 (2011/02)
• Patch pour le noyau Linux améliorant l’interactivité entre les applications console et Xorg (2010/11)
• Python 2.7 (2010/07)
• Sortie de la version 2.11 de la bibliothèque standard C GNU (glibc) (2009/11)
• Intel ne maintient plus le pilote Linux Poulsbo depuis un an et demi (2009/10)
• Python arrive en version 3.1 (2009/07)
• Debian remplace la glibc par eglibc (2009/05)
• Nouvelle version majeure de Python (2.6) (2008/10)
1.6. Articles on linuxfr.org
7
Victor Stinner’s Notes Documentation, Release 1.0
8
Chapter 1. My Projects
CHAPTER 2
Old Projects (2000-2009)
See also my current projects!
2.1 Websites
• La page de Haypo (2005)
• Turbo Pascal (2004)
2.2 Conferences
• 2009, Pycon FR at Paris (France):
– Comprendre les erreurs Unicode: slides, video
– Contribuer à Python
– Python bling bling: slides, video
– Interview of myself
• 2009, OSDC at Paris (France):
– Génerer des nombres aléatoires avec Hasard.
9
Victor Stinner’s Notes Documentation, Release 1.0
• 2009: FOSDEM at Bruxelle (Belgium):
– Fusil the fuzzer
– Video of the demo (fusil-python.ogg)
– Video: FOSDEM 2009 Fusil fuzzing
• 2008, RMLL at Mont de Marsan (France):
– Assurance qualité avec Fusil le fuzzer
– https://github.com/haypo/conf/tree/master/2008-RMLL
• 2008, Pycon FR at Paris (France):
– PyPy
– PyPy: video
– Python 3 aka “Pytho 3000”
– Python 3: video
• 2007, SSTIC at Rennes (France):
– Project Fusil
• 2007, Pycon FR at Paris (France): https://github.com/haypo/conf/tree/master/2007-Pycon-Paris
• January 2007, AAM (Appel À Mousser) at Strasbourg (France):
– Hachoir
• 2005, UTBM at Belfort (France), Lolut association: Atelier sécurite PHP and MySQL
– PHP “crackme” exercices: vulnerable PHP pages
– Failles en PHP et injection SQL
• 2005, UTBM at Belfort (France), Lolut association: Atelier sécurité of C programming
– Introduction générale à la sécurité informatique (french)
– Mots de passe, chiffrement et signature
– Aide-mémoire sur les failles en C
– C exercices: vulnerable C programs
• 2005, Gameover at Limoges (France):
– Wormux (french), conference given with Lawrence Azzoug.
2.3 Paper Articles
• Netfilter et le filtrage du protocole IPv6 (french): GNU/Linux Magazine HS 41 (April 2009)
• Hors-série Linux Mag : Explorez les richesses du langage Python (january/february 2009). I wrote 4 articles:
– Nouveautés de Python 2.6
– Nouveautés de Python 3.0
– Trucs et astuces
– Ctypes et Python
10
Chapter 2. Old Projects (2000-2009)
Victor Stinner’s Notes Documentation, Release 1.0
• “Pratique du fuzzing avec Fusil” (french), MISC magazine n°39 (september 2008)
• “Comment réaliser un fuzzer ?” (french), MISC magazine n°36 (march 2008)
– http://www.haypocalc.com/blog/index.php/2008/03/10/136-comment-raliser-un-fuzzer
2.4 Python projects
• Fusil: Fusil is a Python library used to write fuzzing programs. It includes fuzzers for Firefox, ClamAV, Mplayer,
Python, etc. I am no more working actively on the project, but it still works.
• pysandbox: Sandbox to run untrusted Python code. Project stopped because it is broken by design.
• python-ipy: Python classes and tools for handling of IPv4 and IPv6 addresses and networks. I don’t need this
module anymore and so I am no more interested to maintain it, the new maintainer is Jeff Ferland aka autocracy.
Python 3.3 now includes ipaddress, a concurrent module to handle IP addresses and networks.
2.5 Other projects
• Warmux, previously known as “Wormux”: Open source clone of the famous 2D game “Worms” by Team17.
• macfly: tool allowing to run one or more programs with a shifted clock compared to system clock. project
written at INL for the french CNES (National Centre for Space Studies)
• happyboom: Prototype of a library to write a turn-based game, it is mostly a network protocol.
• HaypoCALC: A formal calculator in text terminal. Existing functions : ln,lg, basen, gcd, lcm, ncr, npr, factor(int), cos,sin,tan, acos,asin,atan, derive,taylor, ... Work on Linux and Windows, programmed in C++.
2.6 Workshops Lolut
Dans le cadre du club Lolut auquel j’appartiens, j’ai organisé plusieurs ateliers :
• En cours de préparation : Ateliers sécurité durant le semestre 2005/2006 (cf. ateliers automne 2005)
• Atelier sécurité - Introduction générale et failles avec le langage C (6 octobre 2005) : Compte rendu, présentation, exercices et liens sont disponibles.
• Atelier création de sites web avec XML/XSL/CSS (08 avril 2004)
• Compte rendu atelier Gimp (25 mars 2004)
• ‘’Programmation C/C++ sous Linux” (18 avril 2002) : pas de compte rendu malheureusement.
• ‘’Atelier PHP” (9 janvier 2003), pas de compte rendu, mais les documents sont disponibles ainsi que les exemples. Je l’ai organisé avec [[Damien Boucard]] et Laurent Adda (ce dernier a quitté l’UTBM la même année
...).
2.7 “Childhood”
I like programming. I started with examples copy-pasted manually from the magazine “Science & Vie Junior”, programs written in “Basic” (Microsoft Quick Basic).
2.4. Python projects
11
Victor Stinner’s Notes Documentation, Release 1.0
Then I moved forward with the programming language “Pascal” (avec Turbo Pascal 7), completed with assembler
Intel x86 (Borland Turbo Assembleur 1 and 2).
The other major change was the move to the object oriented programming (OOP) with Borland C++ Builder and short
tests with gcc on MS-DOS (it didn’t work well). When I started to study at UTBM (engineer school), I switched from
Windows to Linux because the 4 computers running Linux were always available and I didn’t thave the Internet at
home.
I learnt HTML and XHTML (completed with Javascript), then PHP, and finally XML + XSLT + CSS. I learnt some
Java at school, and Python at home.
I also developed with Visual Basic and Delphi, but I didn’t do anything interesting with them.
12
Chapter 2. Old Projects (2000-2009)
CHAPTER 3
Victor Stinner
3.1 Contact me
• Email: victor.stinner@gmail.com
• IRC: my nickname is haypo on Freenode and OFTC servers
3.2 My public projects
See the page of my projects.
13
Victor Stinner’s Notes Documentation, Release 1.0
3.3 Other people talking about me
• Montreal Python User Group: Person of the Month: Victor Stinner #MP42
• developer of note: Victor Stinner by Tshepang Lekhonkhobe
3.4 About me
• Twitter: “Opensource Python Hacker”
• Pycon US 2014: Python core developer since 2010, I’m the author of various Python applications and libraries
Python. See my profile on Bitbucket and and Github. I’m now working on OpenStack at Enovance (Paris).
• FOSDEM 2013: Python code developer since 3 years, I love hacking free softwares to improve them, especially
the Python project.
• techs.enovance.com: Victor is a Senior Engineer at eNovance. Python core developer since January 2010, he
eats bugs for breakfast and reviews patches for dinner. On his free time, he shares code of open source projects
on Bitbucket and Github.
14
Chapter 3. Victor Stinner
CHAPTER 4
My Contributions to Free Softwares
First, see my projects.
4.1 Contributions to Python
4.1.1 Major work
• Python 3.4:
– new tracemalloc module (PEP 454)
– better handling of MemoryError exceptions
• Python 3.3:
– new faulthandler module
– new time functions: time.monotonic, time.perf_counter, time.process_time (PEP 418)
• Unicode support: most work done during development of Python 3.1-3.3
• Early work on Unicode before Python 3 in the “Python 3000” branch
• Fuzzing
4.1.2 My accepted PEPs
• PEP 454: Add a new tracemalloc module to trace Python memory allocations (Python 3.4)
• PEP 446: Add new parameters to configure the inheritance of files and for non-blocking sockets (Python 3.4).
See also the PEP 433: Easier suppression of file descriptor inheritance which was the previous try.
• PEP 445: Add new APIs to customize Python memory allocators (Python 3.4)
• PEP 418: Add monotonic time, performance counter, and process time functions (Python 3.3)
4.1.3 My rejected PEPs
• PEP 416 (rejected): Add a frozendict builtin type
• PEP 410 (rejected): Use decimal.Decimal type for timestamps
• PEP 400 (deferred): Deprecate codecs.StreamReader and codecs.StreamWriter
15
Victor Stinner’s Notes Documentation, Release 1.0
4.1.4 Old contributions to Python
Accepted patches:
• 2008-07-06: invalid ref count on locale.strcoll() error. Patch appliqué dans la révision 65134.
• 2008-07-09: bugs in scanstring_str() and scanstring_unicode() of _json module. Patch inspiré du mien commité
dans la révision 65147.
• 2008-07-06: segfault on gettext(None). Patch appliqué dans la révision 65133.
• 2008-07-07: bugs in _sqlite module. Patch appliqué dans la révision 65040
• 2008-07-06: Use Py_XDECREF() instead of Py_DECREF() in MultibyteCodec and MultibyteStreamReader.
Patch appliqué dans révision 65038
• 2008-07-07: dlopen() error with no error message from dlerror(). Patch appliqué dans rev 64976, rev 64977 et
64978
• 2008-07-07: missing lock release in BZ2File_iternext(). Appliqué dans le commit 64767.
• 2008-07-06: DoS when lo is negative in bisect.insort_right() / _left(). Appliqué dans le commit 64845.
• 2008-07-06: audioop.findmax() crashs with negative length. Appliqué dans le commit 64775.
• 2008-07-06: invalid call to PyMem_Free() in fileio_init(). Appliqué dans le commit 64758
• 2007-08-13: Improved patches for sndhdr and imghdr
• 2007-08-10: Fix the ctypes tests, corrige ctypes pour le passage de str/unicode à bytes/str.
• 2007-04-10: Segfaults quand la mémoire est épuisée (rapport de bug avec patch) => patch appliqué (avec un
léger changement) dans le commit 54757 (par georg.brandl).
• 2007-02-27: trace.py needs to know about doctests. Patch applied the 23 Nov 2007.
• 2006-09-06: Bug locale.getdefaultlocale(), lorsque le module _locale est absent, la fonction locale.getdefaultlocale() retourne un charset errorné avec mes locales. Corrigé dans Python 2.5.1.
• 2006-08-23: Bug report with patch, La fonction setup() du module distutils refusait un tuple (au lieu d’une liste)
pour la commande « register » (le patch a été retouché pour fonctionner sur Python 2.1)
• 2005-11-25: bug report + patch. La méthode seek(0,2) d’un objet du module bz2 était boguée dans Python 2.4.2
Pending patches:
• 2008-07-09: _multiprocessing.Connection() doesn’t check handle
• 2008-07-06: block operation on closed socket/pipe for multiprocessing
• 2008-07-06: invalid check of _bsddb creation failure
• 2008-07-06: invalid object destruction in re.finditer()
• 2007-07-23: Unable to register or upload project (http error 302: moved)
• 2007-07-17: Problem with socket.gethostbyaddr() and KeyboardInterrupt
4.2 Old Work (2004-2008)
First, see my old projects.
16
Chapter 4. My Contributions to Free Softwares
Victor Stinner’s Notes Documentation, Release 1.0
4.2.1 Accepted patches in other projects
• 2008-05-08, PyPy: modules pwd et syslog implémentés avec ctypes (bon maintenant j’ai un compte Subversion
chez PyPy, alors j’accepte mes propres contrib’ :-))
• 2008-03-05, PyPy: _locale module implementation in ctypes
• 2008-02-21, PyPy: resource module implementation using ctypes
• 2007-12-03, Apache: Fix XSS in error page #413. Voir le commit dans Subversion.
• 2006-09-06, PyPy: Corrige le module codec pour la casse des charsets (pour être compatible avec CPython)
• 2006-08-21, urwid: Patch ‘’setuptools’‘ (appliqué dans la version 0.9.6)
• 2006-04-27, Dia : http://bugzilla.gnome.org/show_bug.cgi?id=334771 Patch qui corrige un plantage alétoire
lors du “dégroupage” d’un objet] (appliqué dans Dia 0.95)
• 2005-06-16, Gnome : Patch pour libgnomeui. Nautilus utilisait 500 Mo de mémoire pour générer une miniature
d’une image SVG de 28 Ko ! Mon patch limite au maximum le gaspillage de mémoire. (appliqué dans la version
2.11)
4.2.2 Pending patches
• 2008-07-07, PHP: count_chars() crashs if both arguments are the same reference
• 2007-08-16, yui: container css: “cursor: pointer” instead of “cursor: hand”
4.3 INL/EdenWall
During my work at INL/EdenWall, I contributed to many open source softwares:
• 2007, iptables: #7080: Don’t silenty exit on failure to open /proc/net/{ip,ip6}_tables_names
• libnfnetlink: #6741: fix autogen.sh (sh syntax for string comparaison)
• libnetfilter_conntrack: #6721: fix a crash on setting the counters of a conntrack, implement getter for the
ATTR_USE attribute
• 2006, libnetfilter_conntrack: #6719: Fix XML output syntax
• libnfnetlink: #6718: Initialize callback structure
• libnetfilter_conntrack: #6716: Fix new API test program (replace ntohs by htons), introduce NFCT_O_PLAIN
flag
• gcrypt (july 2006): Fix missing initializer warning in gcrypt.h
• Microoptimize destruction of unused statitically initialized mutexes
• 2005, (lxml library) Invalid use of xmlIO: crash on xmlCharEncCloseFunc()
• (CPython) Bugfix for crashes on low-memory conditions
• (Python ctypes) ctypes: wrong calling convention for _string_at. See issue #3554, 3900 was a duplicate of this
bug :-/
• PHP: bug report #42817
• Dia: Bug #334771 (Ungroup crashes) fixed
4.3. INL/EdenWall
17
Victor Stinner’s Notes Documentation, Release 1.0
• libc: Bug report made by Victor Stinner: vfprintf() segfault with multibyte string and long precision. Ulrich
Drepper fixed the bug: see vfprintf patch v1.136
Security vulnerabilities:
• 2007-05-22: CVE-2007-2754: FreeType Integer Overflow in TT_Load_Simple_Glyph()
• 2007-05-11: CVE-2007-2650: ClamAV OLE2 Parser Denial of Service
• 2007-05-10: CVE-2007-2645: Libexif Integer Overflow Vulnerability in exif_data_load_data_entry()
4.4 Fuzzing
Thanks to my project Fusil, I found and sometimes fixed many bugs in various softwares. See the list of crashes found
by Fusil.
4.5 Bug reports
Fixed:
• 2007-05-07, ImageMagick: Crash in EXIF parser with invalid IFD count. The file also crash gwenview application.
• 2007-04-30, libc: vfprintf() segfault with multibyte string and long precision.
• Le bug a été corrigé par Ulrich Drepper : patch vfprintf v1.136
• Rapport de bug Fedora Core
• Rapport de bug Debian
• 2007-04-28, FreeType: Another bug in TTF (cmap), voir le patch sfnt/sfobjs.c version 1.128
• 2007-04-27, FreeType: Bug in fuzzed TTF file. Voir le patch (dans CVS).
Open:
• 2008-02-21: PyPy, large-file support and file.seek()
• 2008-01-28: Firefox, Venkman crashs on profiling after clearing profile data
• 2008-01-28: command-not-found, phpize is missing from program.d database
• 2007-10-01: PHP, buffer under- and overflow on clone(null)+array_push()
• Diff sur zend_vm_execute.h
• Tests de non regression : bug36071.phpt, bug42817.phpt, bug42818.phpt
• 2007-07-05, ClamAV:
– #561: OLE2: Long (slow) loop in ole2_walk_property_tree() with huge prop_index value
– #560: bitset_realloc() is not atomic (avec patch et testcase)
– #559: OLE2: Allocate too much memory with invalid file (avec patch et testcase)
• 2007-04-18, ClamAV: Bug in OLE2 file parser (DoS found with fuzzing), dans bugzilla: Bug #466 (fermé au
public)
• 2007-04-20, ImageMagick: Bug report in TGA and XCF files (DoS found with fuzzing)
• 2005-06-16, gdb : Display libc function names instead of address?
18
Chapter 4. My Contributions to Free Softwares
Victor Stinner’s Notes Documentation, Release 1.0
4.6 Other
• I contributed to some articles on the french Wikipedia, like: Sténographie.
4.6. Other
19
Victor Stinner’s Notes Documentation, Release 1.0
20
Chapter 4. My Contributions to Free Softwares
CHAPTER 5
The Python programming language
• Homepage of the Python project
• Unicode in Python 2 and Python 3
5.1 Python 2 or Python 3?
My article Why should OpenStack move to Python 3 right now? explains why you should move to Python 3 right now.
5.2 Port Python 2 code to Python 3
• six module
• 2to6
• Language differences and workarounds (python3porting book)
• Porting Python 2 Code to Python 3 (docs.python.org)
• python-porting mailing list
• getpython3.com
• Porting code to Python 3 (Python.org wiki)
• python-incompatibility
• 2to3c
• py3to2
• Python 3 Wall of Superpowers
• Can I Use Python 3?
21
Victor Stinner’s Notes Documentation, Release 1.0
5.3 Python packaging
• Install pip
• pip documentation
• Python Packaging User Guide
• Python Wheels
5.4 Compile Python extensions on Windows
• Install Windows SDK
• Run “Windows SDK Command Prompt”
• Type:
setenv /x64 /release
set MSSDK=1
set DISTUTILS_USE_SDK=1
5.5 Build a Python Wheel package on Windows
• Install pip
• Install wheel using pip:
\python27\python.exe -m pip install wheel
• Run “Windows SDK Command Prompt”
• Setup the environment to build code in 64-bit mode (replace /x64 with /x86 for 32bit):
setenv /x64 /release
set MSSDK=1
set DISTUTILS_USE_SDK=1
• Go to your project
• Cleanup the project (is it really needed?):
del build\*
del dist\*
• Build the wheel and upload it:
\python27\python.exe setup.py bdist_wheel upload
Notes:
• To build a 32-bit wheel, you need 32-bit Python and configure the SDK using /x86.
• Python 2.7 requires the Windows SDK v7.0 because Python 2.7 is built using Visual Studio 2008 (MSVCR90).
Python 3.3 is built using Visual Studio 2010.
• It looks like Python 3.3 doesn’t need MSSDK and DISTUTILS_USE_SDK environment variables anymore.
22
Chapter 5. The Python programming language
Victor Stinner’s Notes Documentation, Release 1.0
5.6 Python 3 is better than Python 2
• 10 awesome features of Python that you can’t use because you refuse to upgrade to Python 3
5.6.1 Bugs that won’t be fixed in Python 2 anymore
Unicode
The Unicode support of Python 3 is much much better than in Python 2. Many Unicode issues were closed as “won’t
fix” in Python 2, especially issues opened after the release of Python 3.0. Some examples:
• Outputting unicode crushes when printing to file on Linux
• stdout.encoding not set when redirecting windows command line output
Bugs in the C stdio (used by the Python I/O)
Python 2 uses the buffer API of the C standard library: fopen(), fread(), fseek(), etcThis API has many bugs.
Python works around some bugs, but some others cannot be fixed (in Python). Examples:
• Issue #21638: Seeking to EOF is too inefficient!
• Issue #1744752: end-of-line issue on Windows on file larger than 4 GB
• Issue #683160: Reading while writing-only permissions on Windows
• Issue #2730: file readline w+ memory dumps
• Issue #228210: Threads using same stream blow up (Windows)
Python 3 has a much better I/O library: the io module which uses directly system calls like open(), read() and
lseek().
Hash DoS
The hash function of Python 2 has a “worst complexity” issue which can be exploited for a denial of service (DoS).
It’s called the “hash DoS” vulnerability. Python 3.3 randomizes the hash function by default, Python 2.7 can use
randomized hash if enabled explicitly. But the real fix is in Python 3.4 with the PEP 456 which now uses the new
SipHash hash function which is much safer.
Monotonic clocks
Timeouts must not use the system clocks but a monotonic clock. It is explained in the PEP 418 which has been
implemented in Python 3.3.
In Python 3.2, locks got a new optional timeout parameter which uses the native OS function.
Example of issue with system clock changes: threading.Timer/timeouts break on change of win32 local time.
See also the PEP 418 for a list of issues related to the system clock.
5.6. Python 3 is better than Python 2
23
Victor Stinner’s Notes Documentation, Release 1.0
Other bugs
Race conditions in the subprocess module:
• subprocess is not thread-safe in Python 2. Bad things happen if Python code is executed between fork and exec,
which is possible.
• subprocess.Popen hangs when child writes to stderr
• Doc: subprocess should warn uses on race conditions when multiple threads spawn child processes
Misc bugs:
• Ctrl-C doesn’t interrupt simple loop: require the new GIL introduced in Python 3.2
5.6.2 Port Python 3 code to Python 2
Notes based on my experience of porting Tulip to Python 2 (Trollius project).
• Remove keyword-only parameter:
func(loop=None): ...
replace def func(*, loop=None):
...
with def
• super() requires the class and self, and the class must inherit from object
• A class must inherit explicitly from object to use properties and super(), otherwise super() fails with a
cryptic “TypeError: must be type, not classobj” message.
• Python 2.6: str.format() doesn’t support {}.
For example, "{} {}".format("Hello",
"World") must be written "{0} {1}".format("Hello", "World").
• Replace list.clear() with del list[:]
• Replace list2 = list.copy() with list2 = list[:]
• Python 3.3 has new specialized OSError exceptions: BlockingIOError, InterruptedError,
TimeoutError, etc. Python 2 has IOError, OSError, EnvironmentError, WindowsError,
VMSError, mmap.error, select.error, etc.
• raise ValueError("error") from None
ValueError("error")
should
be
replaced
with
raise
• memoryview should be replaced with buffer
Major changes in between Python 2.6 and 3.3:
• threading.Lock.acquire() and subprocess.Popen.communicate() support timeout. A busy
loop can be used for threading.Lock.acquire() (non-blocking call + sleep) in Python 2.
• time.monotonic() (3.3)
• set and dict literals
• memoryview object
• collections.OrderedDict (2.7, 3.1)
• weakref.WeakSet (2.7, 3.0)
• argparse
• Python 2 doesn’t support ssl.SSLContext nor certificate validation
• ssl module: SSLContext, SSLWantReadError, SSLWantWriteError, SSLError
• Python 2 does not support yield from and does not support return in generators (3.3)
24
Chapter 5. The Python programming language
Victor Stinner’s Notes Documentation, Release 1.0
• Python 2 doesn’t support the nonlocal keyword: use mutable types like list or dict instead (3.0)
New modules in the standard library between Python 2.6 and Python 3.3:
• concurrent.futures (3.2)
• faulthandler (3.3)
• importlib (3.1)
• ipaddress (3.3)
• lzma (3.3)
• tkinter.ttk (3.1)
• unittest.mock (3.3)
• venv (3.3)
Python 3.4 has even more modules:
• asyncio
• enum
• ensurepip
• pathlib
• selectors
• statistics
• tracemalloc
5.7 History of Python releases
• Python 3.4: March 2014
• Python 3.3: September 2012
• Python 3.2: February 2011
• Python 2.7: July 2010
• Python 3.1: June 2009
• Python 3.0: December 2008
• Python 2.6: October 2008
• Python 2.5: September 2006
• Python 2.0: October 2000
• Python 1.5: April 1999
5.8 History of the Python language (syntax)
• (Python 3.4: no change)
• Python 3.3:
– yield from: PEP 380 “Syntax for Delegating to a Subgenerator”
5.7. History of Python releases
25
Victor Stinner’s Notes Documentation, Release 1.0
– u’unicode’ syntax is back: PEP 414 “Explicit Unicode literals”
• (Python 3.2: no change)
• Python 2.7:
– all changes of Python 3.1
• Python 3.1:
– dict/set comprehension
– set literals
– multiple context managers in a single with statement
• Python 3.0:
– all changes of Python 2.6
– new nonlocal keyword
– raise exc from exc2: PEP 3134 “Exception Chaining and Embedded Tracebacks”
– print and exec become a function
– True, False, None, as, with are reserved words
– Change from except exc, var to except exc as var: PEP 3110 “Catching Exceptions in
Python 3000”
– Removed syntax: a <> b, ‘a‘, 123l, 123L, u’unicode’, U’unicode’ and def func(a,
(b, c)): pass
• Python 2.6:
– with: PEP 343 “The “with” Statement”
– b’bytes’ syntax: PEP 3112 “Bytes literals in Python 3000”
5.9 Python for PHP developers
• http://www.php2python.com/
• http://www.pythonphp.org/ (french)
26
Chapter 5. The Python programming language
CHAPTER 6
Unicode
Read my free ebook: Programming with Unicode!
6.1 Encodings
• Windows
– OEM code page: used by stdin, stdou and stderr in the Windows console
– ANSI code page: used by all other Windows “ANSI” functions. Some examples: filenames, command
line arguments, environment variables, etc.
• UNIX: “Locale encoding”
– LC_CTYPE locale
– used for filenames, command line arguments, environment variables, the console (stdin, stdout, stderr)
• Common encodings:
– UTF-8
– ISO 8859-1 aka Latin1 or Windows code page 1252
– ASCII
6.2 Python
See my conference (in french) “Comprendre les erreurs Unicode” (Pycon FR 2009 at Paris): slides (PDF) and video.
27
Victor Stinner’s Notes Documentation, Release 1.0
6.2.1 Narrow and wide builds, PEP 393
Python 3.3 introduced the Flexible String Representation (PEP 393) and supports the whole Unicode range (U+0000
- U+10ffff) on all platforms.
Older Python versions had a “narrow or wide” compilation option:
• UNIX and Mac OS X uses wide mode: the unicode type uses 32-bit code points. In Unicode, it is called the
UCS-4 encoding.
• Windows uses narrow mode: the unicode type uses 16-bit code points, non-BMP characters (unicode range
U+10000 - U+10ffff) are used as a surrogate pair (two 16-bit code points). In Unicode, it is called the
UTF-16 encoding. This mode is preferred on Windows because Windows kernel uses also the UTF-16 encoding internally.
Use sys.maxunicode == 0xffff to check if Python is compiled in narrow mode.
sys.maxunicode is equal to 0x10ffff.
Otherwise,
6.2.2 Python 2
• str type and "abc" are strings of bytes, unicode type is a string of characters
• “Default encoding”
– sys.getdefaultencoding()
– used by unicode.encode() and str.decode() when no encoding is specified
– ASCII by default, must not by modified (sys.setdefaultencoding())
• File system encoding
– sys.getfilesystemencoding()
– used to encode filenames and environment variables
– used on UNIX by os.listdir(unicode) to decode filenames
– ANSI code page (mbcs) on Windows, utf-8 on Mac OS X, the locale encoding on UNIX
• Locale encoding
– locale.getpreferredencoding()
– used by default by io.TextIOWrapper
– ANSI code page on Windows, LC_CTYPE locale on UNIX
• OEM code page (Windows only)
– sys.stdin.encoding, sys.stdout.encoding and sys.stderr.encoding
6.2.3 Python 3
• bytes type is a string of bytes, str type and "abc" are strings of characters
• UTF-8
– used for the default encoding of the source code
• “Locale encoding”
– locale.getpreferredencoding()
28
Chapter 6. Unicode
Victor Stinner’s Notes Documentation, Release 1.0
– ANSI code page on Windows, LC_CTYPE locale on UNIX
– used by sys.stdin,
io.TextIOWrapper)
sys.stdout,
sys.stderr,
and by default by open() (and
• “File system encoding”
– sys.getfilesystemencoding()
– ANSI code page (mbcs) on Windows, utf-8 on Mac OS X, the locale encoding on UNIX
– used for filenames, command line arguments, environment variables
• “Default encoding”
– sys.getdefaultencoding(), hardcoded to utf-8
– used by bytes.decode() and str.encode() when no encoding is specified
• OEM code page (Windows only)
– sys.stdin.encoding, sys.stdout.encoding and sys.stderr.encoding
6.2. Python
29
Victor Stinner’s Notes Documentation, Release 1.0
30
Chapter 6. Unicode
CHAPTER 7
The new Python asyncio module aka “tulip”
Asynchronous becomes very popular nowadays. The Javascript language has node.js, Erlang language has XXX, the
Go language has Goroutines and Channels. What about Python?
7.1 asyncio projects
• Python 3.4: asyncio documentation
• Python 3.3: Tulip project homepage
• Python 2.x: Trollius
7.2 asyncio event loops
• ZeroMQ
• geventreactor
• gevent3
• greenlet https://github.com/1st1/greentulip
• libuv: rose, a PEP-3156 compatible event loop
• Qt:
– Mark Harviston’s PEP 3156 Event-Loop with Qt: https://github.com/harvimt/quamash
– Twisted qt4reactor: https://github.com/ghtdak/qtreactor/blob/master/qt4reactor.py
31
Victor Stinner’s Notes Documentation, Release 1.0
– Twisted qt4reactor: http://bazaar.launchpad.net/~qt4reactor-dev/qt4reactor/trunk/view/head:/qt4reactor.py#L55
• Tk:
– See Dino Viehland ‘s talk at Pycon US 2013
– https://us.pycon.org/2013/schedule/presentation/62/
– http://www.youtube.com/watch?v=oJQdX_w1vXY
• Twisted: asyncio_twisted.py
• wxPython:
http://twistedmatrix.com/documents/12.3.0/core/howto/choosing-reactor.html#auto13
http://wiki.wxpython.org/wxPythonAndTwisted
and
• Tornado: experimental asyncio support built right into it.
• Glib: gbulb; include Gtk+ and GApplication event loops (suitable for GTK+ applications)
7.3 Talks about asyncio
• “Tulip: Async I/O for Python 3” by Guido van Rossum, at LinkedIn, Mountain View, Jan 23, 2014
– Record of a Google Hangout (2h22)
– Slides
• “Tulip: Async I/O for Python 3” by Guido van Rossum, Oct 29, 2013 at Twitter University for the San Francisco
Python User Group
– Youtube video
• Tulip or not Tulip by Jose Ignacio Galarza, Pycon Spain 2013, Nov 26, 2013. Nice introduction to Tulip.
• PEP-3156: Async I/O en Python (spanish), by Saúl Ibarra Corretgé, , Pycon Spain 2013, Nov 2013.
• “PyCon 2013 Keynote” by Guido van Rossum, Mar 20, 2013
– Youtube video
– LWN report: PyCon: Asynchronous I/O
• “Python Async IO Horizon” by Lukasz Dobrzanski, Jan 17, 2014
– Slides at Slideshare
• “A deep dive into PEP-3156 and the new asyncio module” by by Saúl Ibarra Corretgé, Feb 03, 2014 (FOSDEM,
Bruxelles)
– Slides at Slideshare
7.4 Tulip projects
• aiohttp: HTTP client/server for asyncio (PEP-3156)
• irc3 plugable irc client based on python’s asyncio
• rainfall: Here’s one more framework =) It allows using @asyncio.coroutine for handlers and supports jinja2
templates
• Vase
32
Chapter 7. The new Python asyncio module aka “tulip”
Victor Stinner’s Notes Documentation, Release 1.0
7.5 Low-level libraries
What?
• sockets (network), stream (TCP) and datagram (UDP)
• pipes (subprocesses)
• files
• signals
Operating system synchronous I/O multiplexing, I/O event notification facility:
• select
• poll
• epoll: Linux
• kqueue: FreeBSD, Mac OS X, NetBSD, OpenBSD, DragonflyBSD
• devpoll: Solaris
• Windows proactor (IOCP)
C libraries:
• libuv
• libev
• libevent, libevent2
Python libraries:
• pyuv (libuv)
• pyev (libev)
• pyevent (libevent)
Common features:
• asynchronous DNS resolution
• multiplexing
7.6 Coroutines
Conference:
• PyCon 2011: An outsider’s look at co-routines by Peter Portante, Pycon US 2011:
Projects:
• Toro: Tornado coroutines
• The difference between yield and yield-from
• gevent: coroutine-based Python networking library
• greenlet: spin-off of Stackless, a version of CPython that supports micro-threads called “tasklets”
• fibers: lightweight concurrent multitasking
7.5. Low-level libraries
33
Victor Stinner’s Notes Documentation, Release 1.0
7.7 High-level libraries
• gunicorn (sync, eventlet, gevent, tornado): Gunicorn ‘Green Unicorn’ is a Python WSGI HTTP Server for UNIX
• diesel
• uwsgi
• concurrence
• Tornado: web framework and asynchronous networking library, “ideal for long polling, WebSockets”
• Twisted: event-driven networking engine
• eventlet
• ZeroMQ
• gruvi (documentation): Synchronous evented IO with pyuv and fibers, based on the PEP 3153: Transportprotocol
• App Engine NBD by Guido van Rossum
• obelus by Antoine Pitrou: Protocol implementation of the Asterisk Manager Interface and Asterisk Gateway
Interface
7.8 Python builtin modules
• multiprocessing: Python 2.6
• concurrent.futures: Python 3.2 (Pool of threads/processes)
7.9 Concurrency
• Unyielding by Glyph, a Twisted developer, February 2014
• The Secret to 10 Million Concurrent Connections -The Kernel is the Problem, Not the Solution
7.10 Issues with eventlet
• SQLAchemy: MySQLdb + eventlet = sad
• OpenStack reaction when adding sleep(0) fixes an eventlet test
• Read “What’s wrong with eventlet?” section of Use the new asyncio module and Trollius in OpenStack
34
Chapter 7. The new Python asyncio module aka “tulip”
CHAPTER 8
OpenStack
8.1 Hack OpenStack
• Gerrit Workflow
– To rerun tests: add a comment recheck no bug, or reverify no bug if the change was approved.
• Zuul Status (other link: Zuul Status, raw output?)
• Hacking Guide
• Gerrit: user search documentation
• Code Review
• GIT Commit Good Practice
• Test Repository
• elastic recheck
35
Victor Stinner’s Notes Documentation, Release 1.0
8.2 Tests
Tools:
• testr: test runner to run tests in parallel; part of testrepository
• testtools: extensions to the Python standard library unit testing framework (ex: testtools.matchers)
• testscenarios: pyunit extension for dependency injection
• subunit: subunit test (binary) streaming protocol
• testrepository: database of test results
• tox: create a environment and run tests
See also:
nose usage
Run unit tests:
. .tox/py27/bin/activate
testr run
Shell commands to run unit tests:
set -e && \
TEMP_REZ=‘mktemp -t‘ && \
python setup.py testr --slowest --testr-args=’--subunit ’ \
| tee $$TEMP_REZ | subunit2pyunit || true ; \
cat $$TEMP_REZ | subunit-filter -s --no-passthrough | subunit-stats ; \
rm -f $$TEMP_REZ ;
• --slowest shows the statistics at the end of the test run. Nothing fancy.
• --testr-args=’--subunit tells testr to output a subunit2 format for its unit tests. subunit2 format is a
BINARY format, which you shouldn’t output to the screen.
• subunit2pyunit will convert that to a nicer output
• tee $$TEMP_REZ ..
cat $$TEMP_REZ | subunit-filter -s --no-passthrough |
subunit-stats shows the nice statistics about the test run (eg: how many tests in total, how many skips,
how many failed, how many success)
8.3 testr: list skipped tests
testr last --subunit|subunit-filter -s|subunit-ls >A
testr last --subunit|subunit-filter -s --no-skip|subunit-ls >B
diff -u A B
8.4 tox/testr: “db type could not be determined” error
testr uses a database to store test results. If the database is created by Python 2, Python 3 cannot read it and then you
get the error “db type could not be determined”.
Workaround: remove .testrepository directory and rerun tox again.
36
Chapter 8. OpenStack
Victor Stinner’s Notes Documentation, Release 1.0
8.5 Re-run a single failing test
8.5.1 testtools
Re-run a single test with testtools:
$ tox -e py33
...
FAIL: tests.test_swiftclient.TestPutObject.test_unicode_ok
...
$ . .tox/py33/bin/activate
$ python -m testtools.run tests.test_swiftclient.TestPutObject.test_unicode_ok
Tests running...
======================================================================
FAIL: tests.test_swiftclient.TestPutObject.test_unicode_ok
---------------------------------------------------------------------...
Ran 1 test in 0.002s
8.5.2 tox
Re-run a single test with tox+testr:
$ tox
...
FAIL:
...
$ tox
...
FAIL:
...
-e py33
tests.test_swiftclient.TestPutObject.test_unicode_ok
-e py33 -- --isolated tests.test_swiftclient.TestPutObject.test_unicode_ok
tests.test_swiftclient.TestPutObject.test_unicode_ok
Note: Enter the virtualenv and type testr run tests.test_swiftclient.TestPutObject.test_unicode_ok
should work, but it doesn’t in the Python 3.3 virtual environment of python-sphinxclient?!
8.5.3 nose
Re-run a single test with nose:
$ nosetests
...
======================================================================
FAIL: tests.test_command_helpers.TestStatHelpers.test_stat_account_human
---------------------------------------------------------------------...
$ nosetests tests.test_command_helpers:TestStatHelpers.test_stat_account_human
8.6 Test issues
• Cryptic error from subunit when an import fails
8.5. Re-run a single failing test
37
Victor Stinner’s Notes Documentation, Release 1.0
– subunit: https://code.launchpad.net/~alexei-kornienko/subunit/bug-1271133
– testrepository: https://code.launchpad.net/~alexei-kornienko/testrepository/bug-1271133
– testtools: Added verbose error information
* Python: No introspective way to detect ModuleImportFailure in unittest
• https://code.launchpad.net/~sileht/testscenarios/testscenarios/+merge/211038
8.7 Trollius in OpenStack
8.7.1 Roadmap
Read the email thread [openstack-dev] [oslo] Asyncio and oslo.messaging.
• Add Trollius dependency: done!
• Port greenio to Trollius. greenio schedules work using greenlet and so it doesn’t need a new thread, nor modifying the “main” function. Done! greenio 0.6 and trollius 1.0 released.
• Add greenio dependency in OpenStack: https://review.openstack.org/108637 Done!
• Add a “greenio” executor to Oslo Message supporting Trollius coroutines.
– Add a ‘greenio’ oslo.messaging executor (spec)
– blueprint
– patch
• Ceilometer
– Set Trollius event loop to greenio
– Change Olso Messaging executor for “greenio”
– Slowy replace eventlet code (implicit async) with Trollius coroutines (explicit async). It implies many
dependencies. For example, the service class of Oslo Incubator heavily depends on eventlet and uses a
strange hack on file descriptors because the class is instanciated before the fork. Todo: instanciate the
class after.
– Than use a class Trollius event loop (using a selector) and drop the eventlet dependency
• Olso Messaging: watch directly the underlying file descriptor of sockets, instead of using a busy loop polling
the notifier
• Ceilometer: use libraries supporting directly asyncio to be able to run parallel tasks (ex: send multiple requests
to a database)
Old plan: Use the new asyncio module and Trollius in OpenStack.
8.7.2 Trollius
• Trollius documentation
• Trollius project in the Python Cheeseshop (PyPI)
• Trollius source code on Bitbucket
38
Chapter 8. OpenStack
Victor Stinner’s Notes Documentation, Release 1.0
8.7.3 Merged patches
• openstack/requirements: Add a new dependency: trollius
8.7.4 Pending patches
• Fix AMQPListener for polling with timeout
• openstack/requirements: Allow trollius 0.2
• Oslo Messaging:
– Add an optional timeout parameter to Listener.poll
– Add a new asynchronous executor based on Trollius
– Blueprint: Add a asyncio executor to oslo.messaging
– Full specification
• Heat: Replace ad-hoc coroutines with Trollius coroutines. Heat coroutines are close to Trollius coroutines.
8.7. Trollius in OpenStack
39
Victor Stinner’s Notes Documentation, Release 1.0
40
Chapter 8. OpenStack
CHAPTER 9
Time
I wrote the Python PEP 418 and I added new functions
to Python 3.3:
• time.monotonic(): timeout and scheduling, not affected by system clock updates
• time.perf_counter(): benchmarking, most precise clock for short period
• time.process_time(): profiling, CPU time of the process
On Linux older than 2.6.28, select(), poll() and epoll() use kernel jiffies and so a resolution of 1/HZ. HZ can be ready
from sysconfig(_SC_CLK_TCK), os.sysconfig(’SC_CLK_TCK’) in Python.
41
Victor Stinner’s Notes Documentation, Release 1.0
Linux 2.6.28:
• High- (but not too high-) resolution timeouts
• What’s in hrtimer.git for 2.6.28
• prctl(PR_GET_TIMERSLACK): 50 us my default.
prctl.get_timerslack(value)
Python module to get/set the timer slack:
Windows:
• libvirt: if HPET is present in the XML file of the Virtual Machine:
<clock offset=’localtime’>
<timer name=’hpet’ present=’yes’/>
</clock>
• Enable HPET:
– open a console in adminstrator mode
– type bcdedit /set useplatformclock true
– reboot
• Disable HPET:
– open a console in adminstrator mode
– type bcdedit /deletevalue useplatformclock
– reboot
• HPET enabled:
– time.time() incremented by 15.6 ms,
– time.monotonic() incremented by 15.0 or 16.0 ms,
– time.process_time() incremented by 15.6 ms (but unrelated to system time, it’s the CPU usage time),
– time.perf_counter() is always incremented (less than 0.2 ms)
– time.get_clock_info(‘perf_counter’).resolution: 10 ns (1.0e-08) => 100 MHz
– HPET frequency should be 14.3 MHz
• HPET disabled:
– xxx
• HPET device missing (ex: disabled in the BIOS):
– time.time() incremented by 15.6 ms,
– time.monotonic() incremented by 15.0 or 16.0 ms,
– time.get_clock_info(‘perf_counter’).resolution: 279.4 ns (2.79e-7) => 3.579 MHz (ACPI Power Management Timer)
• Control Panel, périphériques:
– Compteur d’événements de haute précision => HPET
– Horloge système CMOS/temps réel
asyncio timeout rounding:
• http://code.google.com/p/tulip/issues/detail?id=106
42
Chapter 9. Time
Victor Stinner’s Notes Documentation, Release 1.0
• http://bugs.python.org/issue20311
• http://bugs.python.org/issue20452
• http://bugs.python.org/issue20505
Other:
• http://www.windowstimestamp.com/
• http://www.haypocalc.com/wiki/Temps
43
Victor Stinner’s Notes Documentation, Release 1.0
44
Chapter 9. Time
CHAPTER 10
Fragmentation of the Heap Memory
• http://python.dzone.com/articles/diagnosing-memory-leaks-python
• http://bugs.python.org/issue11849
• high fragmentation of the memory heap on Windows http://bugs.python.org/issue19246
• http://bmaurer.blogspot.it/2006/03/memory-usage-with-smaps.html
• https://mail.gnome.org/archives/gnome-list/1999-September/msg00036.html
• http://www.canonware.com/jemalloc/
• When
Linux
Runs
Out
of
Memory,
by
Mulyadi
Santosa
http://www.linuxdevcenter.com/pub/a/linux/2006/11/30/linux-out-of-memory.html?page=1
(11/30/2006)
• glibc: 3 Virtual Memory Allocation And Paging http://www.gnu.org/software/libc/manual/html_node/Memory.html#Memory
• malloc_trim()
• malopt():
– M_TRIM_THRESHOLD: default=256** 1024
– M_MMAP_THRESHOLD: default=256** 1024
– M_MMAP_MAX: default=65536
• GNU libc
– glibc doc: Efficiency Considerations for malloc http://www.gnu.org/software/libc/manual/html_mono/libc.html#Efficiencyand-Malloc
– mmap thredshold: 256 KB by default with the GNU libc
– implementation: ptmalloc
• uClibc
– implementation: dlmalloc, written by Doug Lea
– uClibc config: MALLOC_STANDARD=y MALLOC_GLIBC_COMPAT=y
– malloc for small allocations, while using mmap: the i386 implementation is different, see Reorganizing
the address space http://lwn.net/Articles/91829/
45
Victor Stinner’s Notes Documentation, Release 1.0
46
Chapter 10. Fragmentation of the Heap Memory
CHAPTER 11
Python Memory
11.1 TODO
• Compute fragmentation of pymalloc
• Compute fragmentation of the glibc heap
• Tool to visualize the fragmentation?
11.2 Benchmarks
https://bitbucket.org/haypo/misc/raw/tip/python/python_memleak.py
Issue #13483: http://bugs.python.org/file26069/tuples.py
11.3 Memory Fragmentation
• Improving Python’s Memory Allocator, Evan Jones: http://www.evanjones.ca/memoryallocator/
• MemoryError when using highlight in hgweb: http://bz.selenic.com/show_bug.cgi?id=3005
11.4 Python Memory Allocators
• PyMem_RawMalloc(), PyMem_RawRealloc(), PyMem_RawFree(): new in Python 3.4
• PyMem_Malloc(), PyMem_Realloc(), PyMem_Free()
• PyObject_Malloc(), PyObject_Realloc(), PyObject_Free()
Current allocators:
• PyMem_RawMalloc(): system malloc()
• PyMem_Malloc(): system malloc()
• PyObject_Malloc(): pymalloc
http://www.python.org/dev/peps/pep-0445/
47
Victor Stinner’s Notes Documentation, Release 1.0
11.5 Windows
• VirtualAlloc
• Windows Low Fragementation Heap (LFH)
• Python: http://bugs.python.org/issue13483
Links:
• http://smallvoid.com/article/winnt-memory-decommit.html
11.6 pymalloc
• Used by PyObject_Malloc()
• Threshold of 512 bytes
• Arenas of 256 KB
Arena allocator:
• Windows: VirtualAlloc(NULL, size, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
• UNIX: mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
• Other: malloc(size);
11.7 Linux
/proc/self/statm:
size
resident
share
text
data
total program size
(same as VmSize in /proc/[pid]/status)
resident set size
(same as VmRSS in /proc/[pid]/status)
shared pages (from shared mappings)
text (code)
data + stack
http://linux-mm.org/Low_On_Memory
/proc, https://www.kernel.org/doc/Documentation/filesystems/proc.txt:
/proc/buddyinfo
/proc/pagetypeifo
/proc/slabinfo => slabtop program
/proc/meminfo
/proc/vmallocinfo
48
Chapter 11. Python Memory
CHAPTER 12
Faster CPython
12.1 Ideas
• PyPy CALL_METHOD instructor
• Lazy formatting of Exception message: in most cases, the message is not used. AttributeError(message) =>
AttributeError(attr=name), lazy formatting for str(exc) and exc.args.
12.2 Plan
• Modify CPython to be notified when the Python code is changed
• Learn types of function parameters and variables
• Compile a specialized version of a function using types and platform informations: more efficient bytecode
using an AST optimizer, or even emit machine code. The compilation is done once and not during the execution,
it’s not a JIT.
• Choose between bytecode and specialized code at runtime
Other idea:
• registervm: My fork of Python 3.3 using register-based bytecode, instead of stack-code bytecode. Read REGISTERVM.txt
• Kill the GIL?
12.3 Status
See also the status of individual projects:
• READONLY.txt
• REGISTERVM.txt
• astoptimizer TODO list
49
Victor Stinner’s Notes Documentation, Release 1.0
12.3.1 Done
• astoptimizer project exists: astoptimizer.
• Fork of CPython 3.5: be notified when the Python code is changed: modules, types and functions are tracked.
My fork of CPython 3.5: readonly; read READONLY.txt documentation.
Note: “readonly” is no more a good name for the project. The name comes from a first implementation using ead-only
code.
12.3.2 To do
• Learn types
• Enhance astoptimizer to use the type information
• Emit machine code
12.4 Why Python is slow?
12.4.1 Why the CPython implementation is slower than PyPy?
• everything is stored as an object, even simple types like integers or characters. Computing the sum of two
numbers requires to “unbox” objects, compute the sum, and “box” the result.
• Python maintains different states: thread state, interperter state, frames, etc. These informations are available in
Python. The common usecase is to display a traceback in case of a bug. PyPy builds frames on demand.
• Cost of maintaince the reference counter: Python programs rely on the garbage collector
• ceval.c uses a virtual stack instead of CPU registers
12.4.2 Why the Python language is slower than C?
• modules are mutable, classes are mutable, etc. Because of that, it is not possible to inline code nor replace a
function call by its result (ex: len(“abc”)).
• The types of function parameters and variables are unknown. Example of missing optimizations:
– “obj.attr” instruction cannot be moved out of a loop: “obj.attr” may return a different result at each call, or
execute arbitrary Python code
– x+0 raises a TypeError for “abc”, whereas it is a noop for int (it can be replaced with just x)
– conditional code becomes dead code when types are known
• obj.method creates a temporary bounded method
12.4.3 Why improving CPython instead of writing a new implementation?
• There are already a lot of other Python implementations. Some examples: PyPy, Jython, IronPython, Pyston.
• CPython remains the reference implementation: new features are first implemented in CPython. For example,
PyPy doesn’t support Python 3 yet.
50
Chapter 12. Faster CPython
Victor Stinner’s Notes Documentation, Release 1.0
• Important third party modules rely heavily on CPython implementation details, especially the Python C API.
Examples: numpy and PyQt.
12.4.4 Why not a JIT?
• write a JIT is much more complex, it requires deep changes in CPython; CPython code is old (+20 years)
• cost to “warn up” the JIT: Mercurial project is concerned by the Python startup time
• Store generated machine code?
12.5 Optimizations
12.5.1 Inline function calls
Example:
def _get_sep(path):
if isinstance(path, bytes):
return b’/’
else:
return ’/’
def isabs(s):
"""Test whether a path is absolute"""
sep = _get_sep(s)
return s.startswith(sep)
Inline _get_sep() into isabs() and simplify the code for the str type:
def isabs(s: str):
return s.startswith(’/’)
It can be implemented as a simple call to the C function PyUnicode_Tailmatch().
Note: Inlining uses more memory and disk because the original function should be kept. Except if the inlined function
is unreachable (ex: “private function”?).
12.5.2 Move invariants out of the loop
Example:
def func(obj, lines):
for text in lines:
print(obj.cleanup(text))
Become:
def func(obj, lines):
local_print = print
obj_cleanup = obj.cleanup
for text in lines:
local_print(obj_cleanup(text))
Local variables are faster than global variables and the attribute lookup is only done once.
12.5. Optimizations
51
Victor Stinner’s Notes Documentation, Release 1.0
12.5.3 C functions using only C types
Optimizations:
• Avoid reference counting
• Memory allocations on the heap
• Release the GIL
Example:
def demo():
s = 0
for i in range(10):
s += i
return s
In specialized code, it may be possible to use basic C types like char or int instead of Python codes which can be
allocated on the stack, instead of allocating objects on the heap. i and s variables are integers in the range [0; 45]
and so a simple C type int (or even char) can be used:
PyObject *demo(void)
{
int s, i;
Py_BEGIN_ALLOW_THREADS
s = 0;
for(i=0; i<10; i++)
s += i;
Py_END_ALLOW_THREADS
return PyLong_FromLong(s);
}
Note: if the function is slow, we may need to check sometimes if a signal was received.
12.5.4 Release the GIL
Many methods of builtin types don’t need the GIL. Example: "abc".startswith("def").
12.5.5 Replace calls to pure functions with the result
Examples:
• len(’abc’) becomes 3
• "python2.7".startswith("python") becomes True
• math.log(32) / math.log(2) becomes 5.0
Can be implemented in the AST optimizer.
12.5.6 Constant folding
Replace constants by their values. Simple example from pickle.py:
52
Chapter 12. Faster CPython
Victor Stinner’s Notes Documentation, Release 1.0
MARK = b’(’
TUPLE = b’t’
def func():
...
self.write(MARK + TUPLE)
The function becomes:
def func():
...
self.write(b’(t’)
Can be implemented in the AST optimizer.
12.5.7 Peephole optimizer
Examples:
• x+0 => x if x is an int
• x*0 => 0 if x is an int
• x*1 => x if x is an int, str or a tuple
• x and True
• x or False
• x = x + 1 => x += 1 if x is an int
12.5.8 Unroll loops
Example:
for i in range(4):
print(i)
The loop body can be duplicated (twice in this example) to reduce the cost of a loop:
for i in range(0,4,2):
print(i)
print(i+1)
i = 3
Or:
print(0)
print(1)
print(2)
print(3)
i = 3
12.5.9 Remove dead code
• if DEBUG: print("debug") where DEBUG is known to be False
12.5. Optimizations
53
Victor Stinner’s Notes Documentation, Release 1.0
12.5.10 Load globals when the module is loaded
Load globals when the module is loaded? Ex: load “print” name when the module is loaded.
Example:
def hello():
print("Hello World")
Become:
local_print = print
def hello():
local_print("Hello World")
Useful if hello() is compiled to C code.
12.5.11 Don’t create Python frames
Inlining and other optimizations don’t create Python frames anymore. It can be a serious issue to debug programs:
tracebacks are an important feature of Python.
At least in debug mode, frames should be created.
PyPy supports lazy creation of frames if an exception is raised.
12.6 Learn types
• Add code in the compiler to record types of function calls. Run your program. Use recorded types.
• Range of numbers (predict C int overflow)
• Optional paramters: forceload=0. Dead code with forceload=0.
• Count number of calls to the function to decide if it should be optimized or not.
• Measure time spend in a function. It can be used to decide if it’s useful to release or not the GIL.
• Store type information directly in the source code? Manual type annotation?
12.7 Emit machine code
• Limited to simple types like integers?
• Use LLVM?
• Reuse Cython or numba?
• Replace bytecode with C functions calls. Ex: instead of PyNumber_Add(a, b) for a+b, emit PyUnicode_Concat(a, b), long_add(a, b) or even simpler code without unbox/box
• Calling convention: have two versions of the function? only emit the C version if it is needed?
– Called from Python:
*kwargs)
Python C API, PyObject* func(PyObject *args, PyObject
– Called from C (specialized machine code): C API, int func(char a, double d)
54
Chapter 12. Faster CPython
Victor Stinner’s Notes Documentation, Release 1.0
– Version which doesn’t need the GIL to be locked?
• Option to compile a whole application into machine code for proprietary software?
12.7.1 Example of (specialized) machine code
Python code:
def mysum(a, b):
return a + b
Python bytecode:
0
3
6
7
LOAD_FAST
LOAD_FAST
BINARY_ADD
RETURN_VALUE
0 (a)
1 (b)
C code used to executed bytecode (without code to read bytecode and handle signals):
/* LOAD_FAST */
{
PyObject *value = GETLOCAL(0);
if (value == NULL) {
format_exc_check_arg(PyExc_UnboundLocalError, ...);
goto error;
}
Py_INCREF(value);
PUSH(value);
}
/* LOAD_FAST */
{
PyObject *value = GETLOCAL(1);
if (value == NULL) {
format_exc_check_arg(PyExc_UnboundLocalError, ...);
goto error;
}
Py_INCREF(value);
PUSH(value);
}
/* BINARY_ADD */
{
PyObject *right = POP();
PyObject *left = TOP();
PyObject *sum;
if (PyUnicode_CheckExact(left) &&
PyUnicode_CheckExact(right)) {
sum = unicode_concatenate(left, right, f, next_instr);
/* unicode_concatenate consumed the ref to v */
}
else {
sum = PyNumber_Add(left, right);
Py_DECREF(left);
}
Py_DECREF(right);
SET_TOP(sum);
if (sum == NULL)
12.7. Emit machine code
55
Victor Stinner’s Notes Documentation, Release 1.0
goto error;
}
/* RETURN_VALUE */
{
retval = POP();
why = WHY_RETURN;
goto fast_block_end;
}
Specialized and simplified C code if both arguments are Unicode strings:
/* LOAD_FAST */
PyObject *left = GETLOCAL(0);
if (left == NULL) {
format_exc_check_arg(PyExc_UnboundLocalError, ...);
goto error;
}
Py_INCREF(left);
/* LOAD_FAST */
PyObject *right = GETLOCAL(1);
if (right == NULL) {
format_exc_check_arg(PyExc_UnboundLocalError, ...);
goto error;
}
Py_INCREF(right);
/* BINARY_ADD */
PyUnicode_Append(&left, right);
Py_DECREF(right);
if (sum == NULL)
goto error;
/* RETURN_VALUE */
retval = left;
why = WHY_RETURN;
goto fast_block_end;
12.8 Test if the specialized function can be used
Write code to choose between the bytecode evaluation and the machine code.
Preconditions:
• Check if os.path.isabs() was modified:
– current namespace was modified? (os name cannot be replaced)
– namespace of the os.path module was modified?
– os.path.isabs function was modified?
– compilation: checksum of the os.py and posixpath.py?
• Check the exact type of arguments
– x type is str: in C, PyUnicode_CheckExact(x)
56
Chapter 12. Faster CPython
Victor Stinner’s Notes Documentation, Release 1.0
– list of int: check the whole array before executing code? fallback in the specialized code to handle non int
items?
• Callback to use the slow-path if something is modified?
• Disable optimizations when tracing is enabled
• Online benchmark to decide if preconditions and optimized code is faster than the original code?
12.9 Kill the GIL?
12.9.1 Why does CPython need a global lock?
Incomplete list:
• Python memory allocation is not thread safe (it should be easy to make it thread safe)
• The reference counter of each object is protected by the GIL.
• CPython has a lot of global C variables. Examples:
– interp is a structure which contains variables of the Python interpreter: modules, list of Python threads,
builtins, etc.
– int singletons (-5..255)
– str singletons (Python 3: latin1 characters)
• Some third party C libraries and even functions the C standard library are not thread safe: the GIL works around
this limitation.
12.9.2 Kill the GIL
• Require deep changes of CPython code
• The current Python C API is too specific to CPython implementation details: need a new API. Maybe the stable
ABI?
• Modify third party modules to use the stable ABI to avoid relying on CPython implementation details like
reference couting
• Replace reference counting with something else? Atomic operations?
• Use finer locks on some specific operations (release the GIL)? like operations on builtin types which don’t need
to execute arbitrary Python code. Counter example: dict where keys are objects different than int and str.
See also pyparallel.
12.10 Links
12.10.1 Fully Python compliant
• PyPy
• Jython based on the JVM
• IronPython based on the .NET VM
12.9. Kill the GIL?
57
Victor Stinner’s Notes Documentation, Release 1.0
• Unladen Swallow fork of CPython 2.6 using LLVM
– Unladen Swallow Retrospective
– PEP 3146
12.10.2 Fully Python compliant??
• psyco
12.10.3 Subset of Python to C++
• Nuitka
• Python2C
• Shedskin
• pythran (no class, set, dict, exception, file handling, ...)
12.10.4 Subset of Python
• pymothoa: use LLVM; don’t support classes nor exceptions.
• unpython: Python to C
• Perthon: Python to Perl
• Copperhead: Python to GPU (Nvidia)
12.10.5 Language very close to Python
• Cython: “Cython is a programming language based on Python, with extra syntax allowing for optional static
type declarations.”
– based on Pyrex
12.10.6 Misc links
• “Need for speed” sprint (2006)
• ceval.c: use registers?
– Java: Virtual Machine Showdown: Stack Versus Registers (Yunhe Shi, David Gregg, Andrew Beatty, M.
Anton Ertl, 2005)
– Lua 5: The Implementation of Lua 5.0 (Roberto Ierusalimschy, Luiz Henrique de Figueiredo, Waldemar
Celes, 2005)
– Python-ideas: Register based interpreter
– unladen-swallow: ProjectPlan: “Using a JIT will also allow us to move Python from a stack-based machine to a register machine, which has been shown to improve performance in other similar languages
(Ierusalimschy et al, 2005; Shi et al, 2005).”
• Use a more efficient VM
• WPython: 16-bit word-codes instead of byte-codes
58
Chapter 12. Faster CPython
Victor Stinner’s Notes Documentation, Release 1.0
• Hotpy and Hotpy 2: built using the GVMT (The Glasgow Virtual Machine Toolkit)
• Search for Python issues of type performance: http://bugs.python.org/
• Volunteer developed free-threaded cross platform virtual machines?
12.10. Links
59
Victor Stinner’s Notes Documentation, Release 1.0
60
Chapter 12. Faster CPython
CHAPTER 13
Microbenchmarks
13.1 Memory
• What Every Programmer Should Know About Memory
– HTML version (first article which ends with links to the following articles)
– PDF version
13.2 Help compiler to optimize
• const keyword?
• aliasing: -fno-strict-aliasing or __restrict__
13.3 Aliasing
• Understanding Strict Aliasing (Mike Acton, June 1, 2006)
• Demystifying The Restrict Keyword (Mike Acton, May 29, 2006)
61
Victor Stinner’s Notes Documentation, Release 1.0
62
Chapter 13. Microbenchmarks
CHAPTER 14
Unsorted Notes
14.1 vim for developer
In these examples, I’m using Mercurial with the command “hg”. To use git, just replace “hg” with “git”. I prefer the
graphical editor gvim. To use the console version, replace “gvim” with “vim”.
View differences:
hg diff | gvim -
Shortcuts:
• a/asyncio/events.py: to open the file, delete a\, put the cursor on the file type, type vs for a vertial
split, and type gf (goto file) to open the file
63
Victor Stinner’s Notes Documentation, Release 1.0
14.2 Windows console
• Kill a blocked command (harder than CTRL+c): CTRL + Scroll Lock key. (send a SIGBREAK signal)
• Redirect stdout and stderr into the file outlog.log: command >output.log 2>&1
14.3 Fedora
Search a package without updating yum cache:
yum search -C pattern
Which package provides the program route?
$ rpm -qf $(which route)
net-tools-2.0-0.15.20131119git.fc20.x86_64
Or if the package is not installed:
$ yum whatprovides route
...
net-tools-2.0-0.15.20131119git.fc20.x86_64 : Basic networking tools
...
Nom de fichier: /usr/sbin/route
Install dependencies to build the package digikam:
yum-builddep digikam
Rebuild a package: Fedora Source RPM.
14.4 Python
iter(obj):
• obj.__iter__()
• obj.__getitem__()
bool(obj):
• obj.__nonzero__()
• obj.__len__() != 0
item in obj:
• obj.__contains__()
• list-like: obj.__getitem__(0), obj.__getitem__(1) until obj.__getitem__(int) returns item!
14.5 libdbus
http://lists.freedesktop.org/pipermail/dbus/2013-July/015727.html
I’m not convinced that it’s possible to give libdbus well-designed multi-threading without a redesign and API break,
at which point you might as well use GDBus instead.
64
Chapter 14. Unsorted Notes
Victor Stinner’s Notes Documentation, Release 1.0
14.6 Code search
• http://code.ohloh.net/
• https://github.com/search/
• http://searchcode.com/
• http://stackoverflow.com/search?q=codecs.encode
14.7 Git remote branches
• List remote branches: git branch -r
• Create
a
new
branch
origin/stable/icehouse:
fix_1369426_icehouse
tracking
the
remote
branch
git branch --track fix_1369426_icehouse origin/stable/icehouse
• (Track and) Pull a remote branch:
git branch --track NAME_REMOTE_BRANCH
git fetch --all
# or: git pull --all
14.8 OpenStreetMap
Map of the town Peypin:
• OpenStreetMap
• Google Maps
• Yahoo Maps
• OSMOSE
• BANO
• KeepItRight
• viamichelin
• Cadastre
Marseille user group:
• https://wiki.openstreetmap.org/wiki/Marseille#Rencontres_mensuelles
• https://wiki.openstreetmap.org/wiki/Marseille/R%C3%A9unions_2014
• http://listes.openstreetmap.fr/wws/info/local-marseille
Wiki:
• http://wiki.openstreetmap.org/wiki/FR:Quality_assurance
• http://wiki.openstreetmap.org/wiki/FR:Map_Features
• BANO
14.6. Code search
65
Victor Stinner’s Notes Documentation, Release 1.0
14.9 Shell script
• bash8: A pep8 equivalent for bash scripts
• checkbashisms: static analysis tool for shell scripts. It looks for particular patterns which indicate a script might
be relying on /bin/sh being bash.
• shellcheck: static analysis and linting tool for sh/bash scripts
14.10 Python zero copy
Python3:
offset = 0
view = memoryview(large_data)
while True:
chunk = view[offset:offset + 4096]
offset += file.write(chunk)
This copy creates views on large_data without copying bytes, no bytes is copied in memory.
14.11 Ftrace
• LWN articles:
– Secrets of the Ftrace function tracer
– Debugging the kernel using Ftrace - part 1
– A look at ftrace
– Debugging the kernel using Ftrace - part 2
– Ftrace: The hidden light switch
• ftrace - Function Tracer: official documentation from the kernel
• ftrace at elinux.org
• Kernel dynamic memory analysis
• Installing and Using Ftrace
14.12 MySQL
MySQL client:
SELECT * FROM instances WHERE uuid=’f12dca29-c51e-4f94-be5a-8aba5dd3c952’ \G
14.13 Mercurial: find tags containing a specific changeset
Let’s say that you want to check which versions contains the _FUTURE_CLASSES variable:
66
Chapter 14. Unsorted Notes
Victor Stinner’s Notes Documentation, Release 1.0
$ grep ’_FUTURE_CLASSES =’ trollius/*.py
trollius/futures.py:
_FUTURE_CLASSES = (Future, events.asyncio.Future)
trollius/futures.py:
_FUTURE_CLASSES = Future
$ hg blame trollius/futures.py|grep ’_FUTURE_CLASSES =’
1712:
_FUTURE_CLASSES = (Future, events.asyncio.Future)
1688:
_FUTURE_CLASSES = Future
$ hg log -r 1688 --template ’{date|isodate}\n’
2014-07-25 10:05 +0200
Ok, so the _FUTURE_CLASSES was added by the changeset 1688 which was made the 2014-07-25. We pick the
oldest changeset, 1712 was probably a fix.
Find the tags which contains the changeset 1688:
$ hg log -r "reverse(descendants(1688)) and tag()" --template "{tags}\t{rev}:{node|short}\n"
trollius-1.0.2 1767:41ac07cd2d03
trollius-1.0.1 1738:83e574a42e16
$ hg log -r trollius-1.0.1 --template ’{date|isodate}\n’
2014-07-30 17:45 +0200
$ hg log -r trollius-1.0.2 --template ’{date|isodate}\n’
2014-10-02 16:47 +0200
The _FUTURE_CLASSES was introduced in trollius-1.0.1 which was released the 2014-07-30. The following release
trollius-1.0.2 (2014-10-02) also contains it, which is expected since trollius-1.0.2 is based on trollius-1.0.1.
Check versions:
$ hg up trollius-1.0.1
$ grep ’_FUTURE_CLASSES =’ trollius/*.py
trollius/futures.py:
_FUTURE_CLASSES = (Future, events.asyncio.Future)
trollius/futures.py:
_FUTURE_CLASSES = Future
$ hg up trollius-1.0
$ grep ’_FUTURE_CLASSES =’ trollius/*.py
trollius/tasks.py:
_FUTURE_CLASSES = (futures.Future, asyncio.Future)
trollius/tasks.py:
_FUTURE_CLASSES = futures.Future
Ok, so in fact the variable was moved from the Python module trollius.tasks to the modle
trollius.futures between versions 1.0 and 1.0.1.
14.14 Misc
• Linux: detect launching of programs (StackOverflow)
• CI
– https://drone.io/
– https://travis-ci.org/
• Validité d’une clause de contrat de travail sur la publication par l’employeur de logiciel sous licence libre
14.14. Misc
67
Victor Stinner’s Notes Documentation, Release 1.0
68
Chapter 14. Unsorted Notes
CHAPTER 15
Indices and tables
• genindex
• modindex
• search
69