PyTable RDBMS Wrapper

PyTable is a relational database wrapper based on BasicProperty's rich property-based modeling system. PyTable provides a "thick" wrapper for PostgreSQL (and MySQL) database adapters which fills in missing functionality in the adapter layer to provide a uniform rich external API.

Provides mechanisms for modeling database schemas using Python objects

Objects use rich descriptors which are able to directly query their source database schema
Can generate SQL statements to create the database from the schema
Provides name lookup for tables, fields etc.

Provides rich query object for composing SQL queries

Two-level substitution (raw SQL syntax and SQL values)
Post-processing hooks
Multi-value data-set requests

Can introspect databases to generate database schema objects
Provides pickleable connection-specification objects which can be used to open database connections without needing any other lookups
Provides some DB-API "extension" features even if the features are not present in the underlying database

Iterable cursors
Rich row-objects

Provides a rudimentary Object-Relational mapper object

Insert (including optional retrieval-on-insertion), update and delete queries built-in
Linked to the database schema, fields become property descriptors, and objects track their "dirty" status
Table schemas can declare object-relational mapper classes

PyTable is not an Object-Relational mapper as such. Instead it is a middle-level interface which takes care of much of the bookkeeping normally required of an application using the DB-API. Code using PyTable is still focused on RDBMS development, it simply has a number of (considerable) shortcuts in the development process.

PyTable was originally created as part of the wxPython Properties Distribution, as an attempt to enable simple RDBMS-based applications. The code is under daily use, but is not yet widely used. It is released under a BSD-style license. PyTable can be downloaded from the project page on SourceForge. You will need the BasicProperty package installed to use it. There's not a lot of documentation at the moment, but there is the pydoc reference available.

Usage

PyTable is not a very large project. It's basically just a wrapper around the DB-API to make DB-API-like programming a little easier. However, the patterns of programming you use with PyTable tend to be different than those you'd use with raw DB-API code, as the bookkeeping services of PyTable make it very convenient to refer to the schema objects in order to direct and control your SQL queries.

Creating Specifiers and Connections

PyTable generally uses propertied objects for representing any database-based object. This includes the specifiers describing a database connection. The DBSpecifier object is responsible for looking up the appropriate database driver, and using that driver to create a database connection.

For example, let's create a database specifier for connecting to our local database:

"""Simple connection example

This just shows how to use a dbspecifier
object to connect to a particular database.

Note: you need a database named test running
on the specified host, with given user and
password to run this script!
"""
from pytable import dbspecifier

specifier = dbspecifier.DBSpecifier(
	drivername = "PyPgSQL",
	host = "localhost",
	user = "test",
	password = "password",
	database = "test",
)

driver, connection = specifier.connect( )
print "Driver:", driver
print "Connection:", connection

Now, just to show that we're actually just a regular DB-API-style interface, we will do something that you would normally never do with PyTable, we'll use the raw connection/cursor to prove that the connection is active.

cursor = connection.cursor()
cursor.execute( """SELECT 42;""" )
print "Life:", cursor.fetchall()

And that's it. Selecting a different database is simply a matter of specifying a different drivername. You can store DBSpecifier objects in pickles, so you can allow your users to edit them and save the edited values to reconnect to the same database.

Driver Name	Implementation Class	Description
SQLite	pytable.pysqlite.sqlitedriver.SQLiteDriver	SQLite via PySQLite
MkSQL	pytable.mk.mkdriver.MkDriver	Metakit via MkSQL (unfinished)
PyPgSQL	pytable.pypgsql.pgdriver.PGDriver	PostgreSQL via PyPgSQL
PyGreSQL	pytable.pygresql.pgdriver.PGDriver	PostgreSQL via PyGreSQL
psycopg	pytable.psycopg.psycodriver.PsycoDriver	PostgreSQL via psycopg
MySQL	pytable.mysql.mydriver.MyDriver	MySQL via MySQLdb

Creating Schemas Manually

Most of the time when using PyTable, you work with a predefined database schema tree. This tree allows for looking up particular tables/fields/constraints at runtime, associating particular row/result-set classes with tables, and generally storing most of the relevant information about your database design in a convenient in-memory format.

For most application development, you create the database schema directly, specifying each table and field in the Python code, specifying constraints and comments regarding each table as you go. This example shows how you would create such a schema.

"""Example of manually creating a database schema

There are two normal ways to create database schemas
for use with pytable.  This approach, (manually creating
the description using the schemabuilder module) is the
more common "application" development pattern, while the
reverse engineering mechanism is more commonly used for
"scripts" which need to deal with existing databases.
"""
from pytable.schemabuilder import *

The schemabuilder module is just a convenience wrapper around the classes defined in the dbschema module. You will likely want to refer to the dbschema pydoc reference to see all the various properties associated with the various schema sub-types. You can also check the table at the end of this section which shows the various schema classes and their associated schemabuilder aliases.


schema = database(
    name = "test",
    comment = """A simple testing database""",

Most schema objects have a comment property, currently these are used solely for documentation purposes, but eventually they may generate "COMMENT" declarations in the database.

    tables = [
        table(
            "pets",
            comment ="""Storage for simple pets information""",
            fields = [
                field(
                    "pet_name", "text", 0, """The name of the pet""",
                    defaultValue = "'stringValue'",

The positional arguments to the "field" function are: field-name, SQL-field-type, SQL-field-size, and comment

Note the use of the raw SQL syntax for the defaultValue declaration. This allows you to include arbitrary SQL code, including calls out to sequences, functions and the like.

                    constraints = [ primary(), notNull() ],

Constraints on the field are specified as a list of constraint objects.

                ),
                field(
                    "pet_age", "integer", 0, """The age of the pet""",
                    constraints = [ notNull() ],
                ),
            ],
        ),
        table(
            "houses",
            comment ="""Storage for simple house information""",
            fields = [
                field(
                    "house_id", "serial", 0, """Unique house identifier""",

The use of the serial data type limits this schema to use on PostgreSQL databases. PyTable doesn't know anything about the data type, and won't warn you about such situations until you try to move the schema to another database.

                    constraints = [ primary(), notNull() ],
                ),
                field(
                    "type", "text", 0, """The type of the house""",
                    constraints = [ notNull()],
                ),
            ],
            defaultRecords = [
                { 'type': 'cage' },
                { 'type': 'dog house' },
                { 'type': 'dog basket' },
                { 'type': 'cat basket' },
                { 'type': 'bowl' },
                { 'type': 'acquarium' },
            ],

The defaultRecords property stores a list of dictionaries (or dictionary-like objects) which are used to generate INSERT statements during SQL generation. The INSERT statements are produced after all other table-related statements as of version 0.5.9

            indices = [
                index( unique=1, fields=('type', ) ),
            ],
        ),
        table(
            "house_pets",
            comment = """Stupid word-play mapping pet: house""",
            fields = [
                field(
                    'house_id', 'integer', 0, """Reference to the house""",
                    constraints = [
                        foreignKey(
                            "houses", # uses primary key by default...
                            onDelete = "CASCADE",
                            onUpdate = "CASCADE",
                        ),

Foreign key constraints are specified much the same as any other constraint, the first two positional arguments are the foreign table and list of foreign fields to which the foreign key refers. If using the primary key of the foreign table (as above) you may omit the foreign fields argument entirely. The onDelete and onUpdate properties of the constraint allow for specifying the reactions to these events (referenced-record deletion, and referenced-record key-value change).

                        notNull(),
                    ],
                ),
                field(
                    'pet_name', 'text', 0, """Reference to the pet""",

                ),
            ],
            constraints = [
                foreignKey(
                    "pets", # foreign table which constrains
                    ("pet_name",), # foreign fields which constrain
                    fields = ("pet_name",), # local fields constrained
                    onDelete = "SET NULL",
                    onUpdate = "CASCADE",
                ),

Table constraints are specified in almost the same way as field constraints, the difference being that the "fields" property of the constraint becomes necessary to specify which fields are being constrained.

            ],
        ),
    ],
)

Class	Schema Builder Names (definition)	Description
DatabaseSchema	database( name, tables=(), comment="", **named )	Schema for an overall database object
IndexSchema	index( fields=(), unique=1, **named )	Schema for defining a particular index (on a particular table)
TableSchema	table( name, fields=(), comment="", **named )	High-level representation of a table's structure/schema
FieldSchema	field( name, dbDataType, displaySize=0, comment="", **named )	Schema for a particular field of a table
SequenceSchema	sequence( name, comment="", **named ) count	A sequence object used for implementing e.g. serial fields
ForeignKeyConstraint	foreignKey( foreignTable, foreignFields=(), comment="", **named ) foreign references	Foreign-key constraint for a field or table
CheckConstraint	check( expression, **named )	An SQL-statement CHECK constraint
PrimaryConstraint	primary primaryKey	(Possibly multi-field) primary-key unique constraint
UniqueConstraint	unique	(Possibly Multi-field) unique-value constraint
NotNullConstraint	notNull	A field's not-null constraint

Generating SQL Statements from a Schema

Since we have manually created our schema above, we would likely want to generate SQL statements for building the schema inside a database. This process is fairly straightforward, an "SQLCreateStatements" object is instantiated, and then called with the schema as argument. It produces a sequence of SQL statements which will create the schema through a database connection (note that the SQL statements do not include creating the database itself, only the tables and sequences within the database schema).

"""Generate DB structure from manual declaration

This example imports the previously created
example schema and generates the SQL statements
required to build that schema.

The sqlgeneration module provides the actual SQL
statement generation.  The sqlquery module is
used to run the SQL statements.
"""
from pytable import dbspecifier, sqlgeneration, sqlquery
import example_schema

specifier = dbspecifier.DBSpecifier(
    drivername = "PyPgSQL",
    host = "localhost",
    user = "test",
    password = "password",
    database = "test",
)

driver, connection = specifier.connect( )

generator = sqlgeneration.SQLCreateStatements(
    # driver can be used to customise what's generated...
    driver,
)
# call the generator with the schema as argument
statements = generator( example_schema.schema )
for statement in statements:
    print statement # just so we see what's being executed
    sqlquery.SQLQuery( sql = statement )( connection )

# as with the DB-API, autoCommit is false by default, so
# everything we've done will disappear when we exit...

In real world situations, you would normally use: SQLDropStatements, SQLCreateStatements, SQLGrantStatements and SQLRevokeStatements (in order) to drop any previous table/sequences, and establish some useful access restrictions. All of the statement generators work in the same way, save that Grant and Revoke have the properties; privileges (list of privilege-description strings), users (list of user/group names) and isGroup (whether the names are groups or users).

Query Objects (Usage)

Generating SQL query strings tends to be a rather tedious and error-prone task, however, in the end, people working with databases professionally do tend to know how to sling SQL, so we'd like to let them use that SQL knowledge to query their databases. In particular, we need a way to build up query strings from a number of different types of elements:

a "root" template into which everything else is substituted
fragments of SQL syntax (e.g. WHERE clauses, or escaped SQL values)
data-values which require escaping/formatting by the database driver

PyTable's approach to this set of problems is the SQLQuery object. This class is both a base class from which you can derive "complex" queries, and a readily-used mechanism for performing simple queries.

SQLQuery objects have two levels of escaping/substitution of named parameters:

the first level is for use in substituting in fragments of SQL syntax, such as WHERE clauses
the second level substitutes in data-values via the driver's escaping mechanism.

they are callable objects which can have their __call__ method overridden, and/or can have their "processResults" method overridden to alter how the result cursor is treated.

Here's a simple example of using an SQLQuery object without subclassing:

import build, example_schema
from pytable import sqlquery

query = sqlquery.SQLQuery(
    sql = """INSERT INTO
        pets(pet_name,pet_age)
    VALUES
        (%%(pet_name)s,%%(pet_age)s);""",
)

Note the user of %% formatting for raw data-values. We don't want to substitute the values in as raw SQL (a single %), we want them to go through the driver's data-value escaping.

query(
    build.connection,

Note the need to pass in a connection or cursor object as the first parameter.

    pet_name = 'Tweety',
    pet_age = 3,
)

cursor = sqlquery.SQLQuery(
    sql = """SELECT * FROM pets %(whereClause)s;""",

Note the single % for fragments of SQL syntax.

)(
    build.connection,
    whereClause = "WHERE pet_name = %(name)s",

Note that, within fragments which are substituted into the query, data-value substitution parameters need only a single %. In other words, the fragment is showing up after the first substitution pass, so it only needs the single escaping.

    name = "Tweety",
)
for row in cursor:
    print 'Row:', row

This last works because the dbcursor class provides an iteration method.

The SQLQuery object has two properties, sql, which stores the SQL query string, and debug, which is a boolean indicating whether the query should print debugging information to the console when running. The debugging information is the query after it's first substitution pass (i.e. just before the driver is given the data-values to escape), and the set of arguments to the query at that point. More precisely, the debug print happens just before the cursor.execute operation.

Query Objects (Subclassing)

SQLQuery objects are designed to be sub-classed. In particular we would often like to be able to specify:

how to call the query (i.e. parameters, and ways to process those parameters for inclusion in the query)
what to do with the result-set/cursor that's returned once the query is returned

Normally we sub-class the query objects in order to create a reusable query that needs a great deal of paramaterisation or post-processing of the results and which is used from a number of different places. Customising the SQLQuery object normally occurs by overriding either __call__ or processResults.

The first sample demonstrates overriding __call__ to pre-process query arguments, in this case, to allow either the primary key of a table, or a row-record for the table to be passed as the determinant for a WHERE clause.

from pytable import sqlquery

class FindHouses( sqlquery.SQLQuery ):
	"""Queries database for houses by pet_name or pet
	"""
	sql = """SELECT h.*
	FROM
		houses h, house_pets hp
	WHERE
		hp.pet_name = %%(pet)s -- note use of doubled percent signs!
	AND
		h.house_id = hp.house_id
	;"""
	def __call__( self, cursor, pet, **named ):
		"""Coerce pet to pet_name and then execute query"""
		if not isinstance( pet, (str,unicode)):
			name = getattr( pet, 'pet_name', None )
			if name is None or not isinstance( name, (str,unicode)):
				raise TypeError( """%r instance %s could not be coerced to a pet_name"""%(
					pet.__class__, pet,
				))
			pet = name
		return super( FindHouses, self ).__call__(
			cursor,
			pet = pet,
			**named
		)

Note that the sql property of the sqlquery.SQLQuery class is being hidden by the "sql" attribute defined in the subclass.

The second sample demonstrates overriding processResults to return something other than the default cursor object from a query:

class ListDatabases( sqlquery.SQLQuery ):
	"""Queries PostgreSQL server for list of database-names

	returns a simple list of string names
	"""
	sql = """SELECT datname FROM pg_database 
	WHERE datname != 'template1' AND datname != 'template0'
	"""
	def processResults( self, cursor, **namedarguments ):
		"""Read database name list from cursor"""
		return [ row[0] for row in cursor.fetchall() ]

processResults receives the cursor which executed the query as the first positional argument, and all other arguments to the query as named arguments.

LazyResultSet and DBResultSet

The results of an SQLQuery are normally a DBCursor object. This is a fairly minimal wrapper around a DB-API cursor, just providing generator-based iteration and a pointer back to the connection from which the cursor was created. Often we want a more list-like operation, particularly the ability to access results in random-access order. We'd often like to be able to do this without loading the whole result-set at once as well.

LazyResultSet provides these semantics, loading and wrapping rows from the cursor and caching the wrapped rows to allow for random-access operation. The DBResultSet object simply adds a pointer to a table-schema to which attribute access is delegated.

Table and Field Implementation Classes

TableSchema and FieldSchema objects and their methods, also DBRow and DBProperty.

Database Introspection Facilities

PyTable has support for querying PostgreSQL, SQLite and MySQL databases to obtain general schema information about the current database. These queries include listing of active databases, tables and fields within those tables. Introspection facilities are used like so:

driver, connection = specifier.connect()
if hasattr( driver, 'listDatabases'):
	result = driver.listDatabases( connection )
	assert result, """0 databases on server? %s"""%(specifier)
if hasattr( driver, 'listTables' ):
	tables = driver.listTables( connection )
	if hasattr( driver, 'listIndices' ):
		for table in tables:
			result = driver.listIndices( connection, tableName=table )
	if hasattr( driver, 'attrDescription'):
		for table in tables:
			cursor = driver.attrDescription( connection, tableName=table )
	if hasattr( driver, 'attrDefault'):
		for table in tables:
			cursor = driver.attrDefault( connection, tableName=table )

Changes

There are two projects within the PyTable SourceForge project, the core PyTable package, and a Zope Database Adapater (DA). This changelog does not cover the project changes from before the split out of the wxPython Properties Distribution.

PyTable

Version 0.8.3

Fixes to support strange behaviours in PyGreSQL (e.g. returning lists-of-integers as strings)
Log errors during query-result processing (requires updated BasicProperty)

Version 0.8.2

Added PyGreSQL (DB-API) driver for PostgreSQL
Various fixes to the test suite pointed out by PyGreSQL testing
Tightened restrictions on schema.lookupName, particularly for index schemas

Version 0.8.1

Fix PySQLite driver's listTables query to match the declared API (i.e. return simple list-of-strings for the table names)

Version 0.8.0

Added PySQLite driver, currently PySQLite 2.0 dependent
Added preliminary module to support pyformat queries on non-pyformat-supporting database drivers

Version 0.7.12

Preliminary support for namespaces/sub-schemas, at the moment only creation is usefully supported, composition doesn't work as one would expect, so schema tables need to be defined in the database's tables property, not the schema's.

Version 0.7.11

Test suite reworked to use command-line options to build the database specifier used for connection, the result is that you should be able to create a database (e.g. test) and run test.py -dtest (assuming a local connection that doesn't require authentication) and have the test-suite pass.
Suppress warning during schema reverse engineering when an index references a system column (normally oid). There's still no awareness of oid-indices, but fixing that looks like it will require some low-level hackery.
Tweak the DBRow actions to not use None as an identifying value for not Null columns, that is, will not consider a NULL as specifying a unique index if the column is not-null constrained
Fix logical error in nullOk property calculation, basically it wasn't taking "primary" constraints on fields as indicating NOT NULL condition (i.e. a primary constraint should be a superset of a notNull constraint)

Version 0.7.10

Fields whose name is not equal to the database-normalised form (i.e. all-upper or all-lower) are now properly matched on retrieval.
Allow for specifying debug=1 for DBRow queries calls to get debugging on individual update/refresh/insert calls

Version 0.7.9

Make DBRow.getProperties return properties in schema order as much as possible

Version 0.7.8

Foreign-key fields without an explicitly specified dataType now have a dataType of foreign.tableName or foreign.tableName.dataType if there's a known dbDatatype.
Setting a field value to None/NULL will cause the value to be deleted rather than raising a TypeError when the system tries to convert it to the base data-type. A previously un-set value will not cause an error to be raised in this case.
Setting a foreign-key field's value to a DBRow instance will attempt to extract from the row the foreign-key fields of the reference.
Some minor docstring and extraneous import cleanup
A little bit more robustness wrt dictionary sets so that missing a particular property-storage dictionary doesn't cause a failure during row updates

Version 0.7.7

DBProperty changes redux:

Refactored "default value" code for dataType property into the FieldSchema object. Effectively just a bug-fix for the changes in 0.7.6, which, though they worked fine, were never getting invoked, as there was another set of default code in the FieldSchema that would always get used for the dataType property.

Version 0.7.6

DBProperty changes:

Foreign-key fields without an explicitly specified dataType now have a dataType of foreign.tableName or foreign.tableName.dataType if there's a known dbDatatype.

Setting a field value to None/NULL will cause the value to be deleted rather than raising a TypeError when the system tries to convert it to the base data-type.
Setting a foreign-key field's value to a DBRow instance will attempt to extract from the row the foreign-key fields of the reference and store that.

Some documentation changes, fairly minor.

Version 0.7.5

Added convenience getForeignFields method to ForeignKeyConstraint DBSchema objects
Began work on psycopg wrapper
Refactored mechanism used for registering the default driver packages
Began work on extending and generalising the test suite
Added capabilities objects for PostgreSQL and MySQL drivers
Added SQLQuery method to use the driver's capabilities to determine whether to encode queries as strings instead of unicode objects
Added mechanism to SQLMultiQuery to report the last-produced record when errors occur (to make debugging easier)

Version 0.7.2

Fix to DBRow's RefreshQuery. Basically the old code was re-using the same variable name to refer to both the old raw data and the newly queried data and was simply re-storing the old data rather than storing the new data.

Version 0.7.1

Added check to see if schema.baseClass has a dbLoad classmethod in preference to schema.baseClass.coerce for loading property values from the database (wrapping values)

Version 0.7.0

Initial creation of a PyTable Zope Database Adapter (see notes below)
Creating it spurred me into adding a new property to driver objects and refactoring a specifier method

Version 0.6.7

Fix to make deleted properties of rows cause the database to adopt NULL (or the field's defaultValue if available)

Verison 0.6.6

Work-around for PostgreSQL performance bug where int literals used for bigint/int2 indices cause the query planner to always use sequential scanning
Fix in LazyResultSet for case where cursor returns null, and also raises error on fetchone
setdefault method added to DBRow's to flesh out the dictionary-like interface

Version 0.6.5

Added baseItemClass attribute to table schemas
Killed old, commented out code for query on TableSchema
Changed format of attribute error for dbconnection objects
Has_key and get methods for dbrow
Implementation of property-value-deletion for dbrow/dbproperty

Version 0.5.9

Fix for the order in which INSERT statements are generated with relation to table index statements
Docstring updates
A few minor tweaks/fixes

Version 0.5.8

A few minor bug-fixes
Added dictionary-like access to dbrows
Experimental xmlgenerator module added

Version 0.5.7

Initial stand-alone release of PyTable

PyTableDA (Zope Database Adapter for PyTable)

Version 0.1.3

Connection Icon fixed
Conflicting security declaration eliminated
newclasssecurity module (dependency) included in CVS and source archive

Version 0.1.2

Work around for connections failing to notice that they've been closed by the server, and thus not reporting to Zope that they should be closed and re-opened.

Version 0.1.1

Excluded file newclasssecurity.py included, allows use of new-style classes under Zope

Version 0.1.0

Initial release of PyTableDA
Adapter is a seperate project from the main package, but .
Adapter appears to be working correctly under Zope 2.7 and Win32 running PyPgSQL, but it's not particularly full-featured or extensively tested/used. Basically it works well enough to act as an authourisation source under exUserFolder (my primary reason for creating it)
Basic connectivity works (including listing available DBs if you have right to join system database)
You can see your table/field definitions from a connected database

License

PyTable RDBMS Middleware
	Copyright (c) 2002-2004, Michael C. Fletcher
	All rights reserved.

THIS SOFTWARE IS NOT FAULT TOLERANT AND SHOULD NOT BE USED IN ANY
SITUATION ENDANGERING HUMAN LIFE OR PROPERTY.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:

    Redistributions of source code must retain the above copyright
    notice, this list of conditions and the following disclaimer.

    Redistributions in binary form must reproduce the above
    copyright notice, this list of conditions and the following
    disclaimer in the documentation and/or other materials
    provided with the distribution.

    The name of Michael C. Fletcher may not be used to endorse or 
    promote products derived from this software without specific 
    prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
COPYRIGHT HOLDERS AND CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
OF THE POSSIBILITY OF SUCH DAMAGE.