Design

Briefly about the main module myquerybuilder.builder. It is about two hundred logical lines of code. It’s mere string building that gives up most aspects of operation, like escaping, connectivity and types, to the underlying MySQL adapters. Nothing really technically fancy, but rather simple thin convenience wrapper.

For supported adapters see Install.

Autocommit

PEP-249 [1] proposes (it doesn’t mandate if you still remember what the last P stands for), auto-commit to be initially off. To speak it outright in case of MySQL, it is blatantly wrong.

Not only it is counterintuitive, different to DBAPIs in other languages, breaks the Zen of Python, explicit is better than implicit, and leads to countless and constant surprise of beginners like where is my data?, why does MyISAM work but InnoDB doesn’t?, why does SELECT work but UPDATE doesn’t?, Why doesn’t MySQL INSERT into database? and so on and so forth [2] [3] [4]. It also leads more established developers to scratch their heads, because InnoDB’s default transaction isolation is REPEATABLE READ [5], and the reader will only receive its first snapshot [6].

Read-only transactions lead to many performance issues [7] [8] and complicate application design when frameworks do weird twist pulling out consistency matters from their inherent domain model origin to request level. In that way even monstrous things like Django with huge codebases and the established practice rethink the decision and enable auto-commit [9].

Because decisions in domain model consistency should be conscious and elaborate, MySQL Simple Query Builder sets auto-commit to the server default setting, which is usually ON.

Nested transaction

InnoDB supports transaction savepoints [10]. These are named sub-transactions that run in the scope of a normal transaction. Having begin, commit and rollback that always behave in a semantically correct way is usually beneficial. It’s so for atomic domain object methods’ that may call other atomic methods, for unit and integration tests and other cases. Here’s how nested calls correspond to queries.

call query
qb.begin() BEGIN
qb.begin() SAVEPOINT LEVEL1
qb.begin() SAVEPOINT LEVEL2
qb.rollback() ROLLBACK TO SAVEPOINT LEVEL2
qb.commit() RELEASE SAVEPOINT LEVEL1
qb.rollback() ROLLBACK
qb.begin() BEGIN
qb.commit() COMMIT

Warning

In MySQL you can query unmatched BEGIN, COMMIT and ROLLBACK without an error. For instance you can ROLLBACK without preceding BEGIN, or several BEGIN queries in row. MySQL just ignores the queries that make no sense. This is no longer the case with myquerybuilder.builder.QueryBuilder which maintains stack order transaction level counter, and raises OperationalError on unmatched calls.

If certain functionality doesn’t need to be atomic or possible speedup from grouping statements into transactions is negligible, just rely in auto-commit and don’t bother. Otherwise if you have decided to employ a transaction make sure you have the idea what ACID [11] is, and what impact InnoDB transaction isolation levels have [17]. It’s important for your transactional code to be properly wrapped in try...except clause:

qb.begin()
try:
  # your workload
  qb.commit()
except:
  qb.rollback()
  raise

To reduce the boilerplate a simple decorator can be implemented as follows. Note that the example is applicable only to methods (or functions whose first argument has attribute _db):

import functools

def transaction(fn):
  '''Decorator that wraps a model method into transaction'''

  @functools.wraps(fn)
  def _transaction(self, *args, **kwargs):
    self._db.begin()
    try:
      result = fn(self, *args, **kwargs)
      self._db.commit()
    except:
      self._db.rollback()
      raise
    else:
      return result

  return _transaction

Implicit transaction completion

MySQL can implicitly (i.e. without your direct command), commit or roll back a transaction. If you use nested transactions and have the following exception, it’s very likely it is the case.

OperationalError: (1305, 'SAVEPOINT LEVEL1 does not exist')

MySQL implicitly commits a transaction in case of most DDLs, system tables modification, locking statements, data loading statements, administrative statements and more [12]. Implicit rollback occurs when [13]:

  • Transaction deadlock is detected
  • Lock wait is timed out (until 5.0.13 or if configured explicitly thereafter)
  • Reconnection happened

The missing savepoint error is very likely originated from a deadlock. Take a look at myquerybuilder.test.builder.TestQueryBuilderTransaction for simulated implicit transaction completion.

Concurrency

Underlying MySQL adapters are not thread-safe so isn’t the library. If you will share the query builder instances, and thus the DBAPI connection object, among threads most likely outcome for a libmysqlclient-based adapter is segmentation fault and crash of whole Python process. For pure-Python adapter you will face weird behaviour, lost data and out-of-sync errors.

Suggested way is one of the simplest. Use thread-mapped persistent connections. It is perfectly fine to have several dozens of established MySQL connections [14]. This way any overhead related to establishing connection is eliminated and you get your data as soon as possible.

Though a transparent connection pooling or thread-mapping may appear in future releases as it simplifies things.

Persistent connection

Persistent connection to MySQL server has its performance benefits, which are paid off by the need to maintain the state and availability. For figuring out whether connection is alive while staying idle for a long time, ping MySQL API function is used. It is also the reconnection function. The library’s counterpart is ping().

Note

In case of web application and persistent connection, you need to call qb.ping(True) before processing a request to ensure that connection is alive and to reconnect if it was lost.

Alternatively, you can wrap your public methods or classes with decorator as follows:

def ping(clsOrFn):
  '''Class or function decorator that pings database connecting before execution
  of the function(s)'''

  if isinstance(clsOrFn, type):
    for name, member in clsOrFn.__dict__.items():
      if not name.startswith('_') and isinstance(member, types.FunctionType):
        setattr(clsOrFn, name, ping(member))

    return clsOrFn
  elif isinstance(clsOrFn, types.FunctionType):
    @functools.wraps(clsOrFn)
    def _ping(self, *args, **kwargs):
      self._db.ping()
      return clsOrFn(self, *args, **kwargs)

    return _ping
  else:
    raise TypeError

MySQL also has auto-reconnection feature [16]. It was even enabled by default from MySQL 5.0.0 to 5.0.3. It allows silent reconnection on any query when the client has found that the connection was lost. It may sound like a handy thing, but in fact it is a dangerous and discouraged one.

When reconnection does occur, no matter silent or deliberate, the connection’s state is severely affected, including but not limited to:

  • Active transaction is rolled back
  • All table locks are released
  • All temporary tables are closed and dropped
  • User variable settings are lost

In worst case the server may not yet know that the client has lost connection, thus the client that has just reconnected may find itself in a separate to its old transaction.

The subtle point with auto-reconnection and pinging is that MySQL-python (and mysqlclient as a fork) accepts reconnect argument for connection’s method ping, which has a side-effect that sets connection-wide auto-reconnection behaviour. In other words:

import MySQLdb
conn = MySQLdb.connect(user = 'guest')
conn.ping(True)

And your connection works in auto-reconnection mode and you should expect the unexpected later on. It is subtle because ping’s docstring is the only source and it’s defined in _mysql.c, so you see one only if you ask it, help(conn.ping). There’s just a brief remark:

Accepts an optional reconnect parameter. If True, then the client will attempt reconnection. Note that this setting is persistent.

Therefore, to avoid the persistent side-effect when called with True reconnection argument ping() calls underlying ping method with False second time.

For simulated behaviour, see myquerybuilder.test.builder.TestQueryBuilder.testPing.

MySQL versus

Here goes a note about what makes MySQL (which usually should be read as InnoDB) “special” and why it has been chosen for the library in the first place. Or in other words why not PostgreSQL? which you may hear claimed superior and “true” database (as opposite to MySQL) here and there.

I can’t say anything good or bad about PostgreSQL. It is just cut off. That is when one submits to Occam’s razor to gain mastery with one thing, rather what acquaintance with many. Thus nothing makes it special in the first place but the historic choice. However I will quote a translation of one good related comment I read long ago, which has only survived in my personal correspondence:

You may notice that Google hasn’t overgrown MySQL and still uses it in replicated setup with hundreds of geographically distributed servers. Makes sense? Why do Facebook, Yahoo and other giants nearly the same thing? So what makes your project so special that you say you have overgrown it?

There is no doubt that true scalability is sharding. Full-featured SQL, when you never shy to JOIN any table, unfortunately, doesn’t scale. In no way. Just forget about it. Eventually, you will be sharding your data and your business logic will retrace back to middleware. Where it belongs, though.

Of course it disturbs “true” RDBMS vendors who wish as most business logic as possible to reside inside the database. Otherwise they have nothing to sell you. So far as business logic lives in middleware, only a reliable storage is needed. This is what MySQL can cope with. Don’t trust me? Ask Google.

[1]https://www.python.org/dev/peps/pep-0249/
[2]http://stackoverflow.com/q/1028671/2072035
[3]http://stackoverflow.com/q/1451782/2072035
[4]http://stackoverflow.com/q/14445090/2072035
[5]http://dev.mysql.com/doc/refman/5.5/en/set-transaction.html#isolevel_repeatable-read
[6]http://stackoverflow.com/q/25991345/2072035
[7]https://blogs.oracle.com/mysqlinnodb/entry/better_scaling_of_read_only
[8]https://mariadb.com/blog/every-select-your-python-program-may-acquire-metadata-lock
[9]https://docs.djangoproject.com/en/1.7/topics/db/transactions/#autocommit
[10]http://dev.mysql.com/doc/refman/5.5/en/savepoint.html
[11]http://en.wikipedia.org/wiki/ACID
[12]http://dev.mysql.com/doc/refman/5.5/en/implicit-commit.html
[13]http://dev.mysql.com/doc/refman/5.0/en/innodb-error-handling.html
[14]http://stackoverflow.com/a/99565/2072035
[15]https://docs.python.org/2/reference/datamodel.html#object.__del__
[16]http://dev.mysql.com/doc/refman/5.0/en/auto-reconnect.html
[17]http://dimitrik.free.fr/blog/archives/02-01-2015_02-28-2015.html