Design¶
Briefly about the main module myquerybuilder.builder
. It is about two hundred logical lines
of code. It’s mere string building that gives up most aspects of operation, like escaping,
connectivity and types, to the underlying MySQL adapters. Nothing really technically fancy, but
rather simple thin convenience wrapper.
For supported adapters see Install.
Autocommit¶
PEP-249 [1] proposes (it doesn’t mandate if you still remember what the last P stands for), auto-commit to be initially off. To speak it outright in case of MySQL, it is blatantly wrong.
Not only it is counterintuitive, different to DBAPIs in other languages, breaks the Zen of Python, explicit is better than implicit, and leads to countless and constant surprise of beginners like where is my data?, why does MyISAM work but InnoDB doesn’t?, why does SELECT work but UPDATE doesn’t?, Why doesn’t MySQL INSERT into database? and so on and so forth [2] [3] [4]. It also leads more established developers to scratch their heads, because InnoDB’s default transaction isolation is REPEATABLE READ [5], and the reader will only receive its first snapshot [6].
Read-only transactions lead to many performance issues [7] [8] and complicate application design when frameworks do weird twist pulling out consistency matters from their inherent domain model origin to request level. In that way even monstrous things like Django with huge codebases and the established practice rethink the decision and enable auto-commit [9].
Because decisions in domain model consistency should be conscious and elaborate, MySQL Simple Query Builder sets auto-commit to the server default setting, which is usually ON.
Nested transaction¶
InnoDB supports transaction savepoints [10]. These are named sub-transactions that run in the
scope of a normal transaction. Having begin
, commit
and rollback
that always behave
in a semantically correct way is usually beneficial. It’s so for atomic domain object methods’
that may call other atomic methods, for unit and integration tests and other cases. Here’s how
nested calls correspond to queries.
call | query |
---|---|
qb.begin() | BEGIN |
qb.begin() | SAVEPOINT LEVEL1 |
qb.begin() | SAVEPOINT LEVEL2 |
qb.rollback() | ROLLBACK TO SAVEPOINT LEVEL2 |
qb.commit() | RELEASE SAVEPOINT LEVEL1 |
qb.rollback() | ROLLBACK |
qb.begin() | BEGIN |
qb.commit() | COMMIT |
Warning
In MySQL you can query unmatched BEGIN, COMMIT and ROLLBACK without an error.
For instance you can ROLLBACK without preceding BEGIN, or several BEGIN queries
in row. MySQL just ignores the queries that make no sense. This is no longer the case
with myquerybuilder.builder.QueryBuilder
which maintains stack order
transaction level counter, and raises OperationalError
on unmatched calls.
If certain functionality doesn’t need to be atomic or possible speedup from grouping
statements into transactions is negligible, just rely in auto-commit and don’t bother.
Otherwise if you have decided to employ a transaction make sure you have the idea what
ACID [11] is, and what impact InnoDB transaction isolation levels have [17]. It’s
important for your transactional code to be properly wrapped in try...except
clause:
qb.begin()
try:
# your workload
qb.commit()
except:
qb.rollback()
raise
To reduce the boilerplate a simple decorator can be implemented as follows. Note that
the example is applicable only to methods (or functions whose first argument has
attribute _db
):
import functools
def transaction(fn):
'''Decorator that wraps a model method into transaction'''
@functools.wraps(fn)
def _transaction(self, *args, **kwargs):
self._db.begin()
try:
result = fn(self, *args, **kwargs)
self._db.commit()
except:
self._db.rollback()
raise
else:
return result
return _transaction
Implicit transaction completion¶
MySQL can implicitly (i.e. without your direct command), commit or roll back a transaction. If you use nested transactions and have the following exception, it’s very likely it is the case.
OperationalError: (1305, 'SAVEPOINT LEVEL1 does not exist')
MySQL implicitly commits a transaction in case of most DDLs, system tables modification, locking statements, data loading statements, administrative statements and more [12]. Implicit rollback occurs when [13]:
- Transaction deadlock is detected
- Lock wait is timed out (until 5.0.13 or if configured explicitly thereafter)
- Reconnection happened
The missing savepoint error is very likely originated from a deadlock. Take a look at
myquerybuilder.test.builder.TestQueryBuilderTransaction
for simulated implicit transaction
completion.
Concurrency¶
Underlying MySQL adapters are not thread-safe so isn’t the library. If you will share the query builder instances, and thus the DBAPI connection object, among threads most likely outcome for a libmysqlclient-based adapter is segmentation fault and crash of whole Python process. For pure-Python adapter you will face weird behaviour, lost data and out-of-sync errors.
Suggested way is one of the simplest. Use thread-mapped persistent connections. It is perfectly fine to have several dozens of established MySQL connections [14]. This way any overhead related to establishing connection is eliminated and you get your data as soon as possible.
Though a transparent connection pooling or thread-mapping may appear in future releases as it simplifies things.
Persistent connection¶
Persistent connection to MySQL server has its performance benefits, which are paid off by the need
to maintain the state and availability. For figuring out whether connection is alive while staying
idle for a long time, ping MySQL API function is used. It is also the reconnection function. The
library’s counterpart is ping()
.
Note
In case of web application and persistent connection, you need to call qb.ping(True)
before processing a request to ensure that connection is alive and to reconnect if it was lost.
Alternatively, you can wrap your public methods or classes with decorator as follows:
def ping(clsOrFn):
'''Class or function decorator that pings database connecting before execution
of the function(s)'''
if isinstance(clsOrFn, type):
for name, member in clsOrFn.__dict__.items():
if not name.startswith('_') and isinstance(member, types.FunctionType):
setattr(clsOrFn, name, ping(member))
return clsOrFn
elif isinstance(clsOrFn, types.FunctionType):
@functools.wraps(clsOrFn)
def _ping(self, *args, **kwargs):
self._db.ping()
return clsOrFn(self, *args, **kwargs)
return _ping
else:
raise TypeError
MySQL also has auto-reconnection feature [16]. It was even enabled by default from MySQL 5.0.0 to 5.0.3. It allows silent reconnection on any query when the client has found that the connection was lost. It may sound like a handy thing, but in fact it is a dangerous and discouraged one.
When reconnection does occur, no matter silent or deliberate, the connection’s state is severely affected, including but not limited to:
- Active transaction is rolled back
- All table locks are released
- All temporary tables are closed and dropped
- User variable settings are lost
In worst case the server may not yet know that the client has lost connection, thus the client that has just reconnected may find itself in a separate to its old transaction.
The subtle point with auto-reconnection and pinging is that MySQL-python (and mysqlclient as a
fork) accepts reconnect
argument for connection’s method ping
, which has a side-effect
that sets connection-wide auto-reconnection behaviour. In other words:
import MySQLdb
conn = MySQLdb.connect(user = 'guest')
conn.ping(True)
And your connection works in auto-reconnection mode and you should expect the unexpected later on.
It is subtle because ping’s docstring is the only source and it’s defined in _mysql.c
, so you
see one only if you ask it, help(conn.ping)
. There’s just a brief remark:
Accepts an optional reconnect parameter. If True, then the client will attempt reconnection. Note that this setting is persistent.
Therefore, to avoid the persistent side-effect when called with True
reconnection argument
ping()
calls underlying ping method with False
second time.
For simulated behaviour, see myquerybuilder.test.builder.TestQueryBuilder.testPing
.
MySQL versus¶
Here goes a note about what makes MySQL (which usually should be read as InnoDB) “special” and why it has been chosen for the library in the first place. Or in other words why not PostgreSQL? which you may hear claimed superior and “true” database (as opposite to MySQL) here and there.
I can’t say anything good or bad about PostgreSQL. It is just cut off. That is when one submits to Occam’s razor to gain mastery with one thing, rather what acquaintance with many. Thus nothing makes it special in the first place but the historic choice. However I will quote a translation of one good related comment I read long ago, which has only survived in my personal correspondence:
You may notice that Google hasn’t overgrown MySQL and still uses it in replicated setup with hundreds of geographically distributed servers. Makes sense? Why do Facebook, Yahoo and other giants nearly the same thing? So what makes your project so special that you say you have overgrown it?
There is no doubt that true scalability is sharding. Full-featured SQL, when you never shy to JOIN any table, unfortunately, doesn’t scale. In no way. Just forget about it. Eventually, you will be sharding your data and your business logic will retrace back to middleware. Where it belongs, though.
Of course it disturbs “true” RDBMS vendors who wish as most business logic as possible to reside inside the database. Otherwise they have nothing to sell you. So far as business logic lives in middleware, only a reliable storage is needed. This is what MySQL can cope with. Don’t trust me? Ask Google.
[1] | https://www.python.org/dev/peps/pep-0249/ |
[2] | http://stackoverflow.com/q/1028671/2072035 |
[3] | http://stackoverflow.com/q/1451782/2072035 |
[4] | http://stackoverflow.com/q/14445090/2072035 |
[5] | http://dev.mysql.com/doc/refman/5.5/en/set-transaction.html#isolevel_repeatable-read |
[6] | http://stackoverflow.com/q/25991345/2072035 |
[7] | https://blogs.oracle.com/mysqlinnodb/entry/better_scaling_of_read_only |
[8] | https://mariadb.com/blog/every-select-your-python-program-may-acquire-metadata-lock |
[9] | https://docs.djangoproject.com/en/1.7/topics/db/transactions/#autocommit |
[10] | http://dev.mysql.com/doc/refman/5.5/en/savepoint.html |
[11] | http://en.wikipedia.org/wiki/ACID |
[12] | http://dev.mysql.com/doc/refman/5.5/en/implicit-commit.html |
[13] | http://dev.mysql.com/doc/refman/5.0/en/innodb-error-handling.html |
[14] | http://stackoverflow.com/a/99565/2072035 |
[15] | https://docs.python.org/2/reference/datamodel.html#object.__del__ |
[16] | http://dev.mysql.com/doc/refman/5.0/en/auto-reconnect.html |
[17] | http://dimitrik.free.fr/blog/archives/02-01-2015_02-28-2015.html |