SQLAlchemy Frequently Asked Questions
- General
- Setting Up/Connecting
- How do I configure logging ?
- The "help" keyword (i.e. help(sqlalchemy)) and "pydoc" commands dont work
- Do I have to do something special to use connection pooling ? Are my connections pooled ?
- How do i specify (insert custom DBAPI argument here) to my connect() function?
- MySQL server has gone away / psycopg.InterfaceError?: connection already closed
- MetaData / Schema
- I am using autoload=True and <insert your problem here>
- My program is hanging when I say table.drop()/metadata.drop_all()
- Does SQLAlchemy support ALTER TABLE, CREATE VIEW, CREATE TRIGGER, Schema Upgrade Functionality ?
- SQLAlchemy gets confused when i try to drop arbitrary database objects that have dependencies on other things it doesn't know about
- How can I sort Table objects in order of their dependency?
- How can I get the CREATE TABLE/ DROP TABLE output as a string ?
- How do I map a column that is a Python reserved word or already used by SA?
- SQL/Transactions
- Object Relational Mapping
- How do I attach an aggregate column (or other SQL expression) to an ORM Query? i.e. max(), count(), x*y etc.
- FlushError: instance <someinstance> is an unsaved, pending instance and is an orphan
- I have a schema where my table doesnt have a primary key, can SA's ORM handle it?
- a single object's primary key can change, can SA's ORM handle it?
- why isnt my __init__ called when I load objects?
- how do I use ON DELETE CASCADE with SA's ORM ?
- I've created a mapper against an Outer Join, and while the query returns rows, no objects are returned. Why not ?
- I'm using "lazy=False" to create a JOIN/OUTER JOIN and SQLAlchemy is not constructing the query when I try to add a WHERE, ORDER BY, LIMIT, etc. (which relies upon the (OUTER) JOIN)
- I set the "foo_id" attribute on my instance to "7", but the "foo" attribute is still None - shouldn't it have loaded Foo with id #7 ?
- How Do I use Textual SQL with ORM Queries ?
- Why is my PickleType column issuing unnecessary UPDATE statements ?
- Is there a way to automagically have only unique keywords (or other kinds of objects) without doing a query for the keyword and getting a reference to the row containing that keyword?
- Integrating with external tools
General
Whos using SQLAlchemy?
We have a wiki page set up at Applications and Sites which collects everything going on right now.
What license is SQLAlchemy licensed under ?
SQLAlchemy is under the MIT License.
Setting Up/Connecting
How do I configure logging ?
SQLAlchemy as of 0.3.0 uses Python's logging module for logging. This is enabled via the standard logging methodology, using named loggers. SQLAlchemy logs database conversations underneath the sqlalchemy.engine namespace, and the ORM dumps debugging messages to a variety of names underneath the sqlalchemy.orm namespace.
A generic logging setup looks like:
import logging
logging.basicConfig()
logging.getLogger('sqlalchemy.engine').setLevel(logging.INFO)
logging.getLogger('sqlalchemy.orm').setLevel(logging.INFO)
Also note that using echo=True on the create_engine or create_session functions will try to set up its own logging configuration, which may conflict with an explicit logger config.
The "help" keyword (i.e. help(sqlalchemy)) and "pydoc" commands dont work
This is a known Python bug which is fixed in version 2.5:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=1506951&group_id=5470
Do I have to do something special to use connection pooling ? Are my connections pooled ?
No and yes. If you have called create_engine(), the returned Engine object has a reference to a connection pool in all cases. The class used for this pool in most cases is sqlalchemy.pool.QueuePool, but in the case of SQLite is sqlalchemy.pool.SingletonThreadPool. The type of pool used as well as its characteristics can be controlled, see the docs.
How do i specify (insert custom DBAPI argument here) to my connect() function?
Method #1:
e = create_engine('postgres://url', connect_args={'sslmode':True})
Method #2:
e = create_engine('postgres://url?sslmode=something')
Method #3:
def conn():
return psycopg.connect(url, arg1, arg2, kwarg1=foo, kwarg2=bar)
e = create_engine('postgres://', creator=conn)
MySQL server has gone away / psycopg.InterfaceError?: connection already closed
This is usually symptomatic of the database closing connections which have been idle for some period of time. On MySQL, this defaults to eight hours.
Use the pool_recycle setting on the create_engine() call, which is a fixed number of seconds for which an underlying pooled connection will be closed and then re-opened. Note that this recycling only occurs at the point at which the connection is checked out from the pool; meaning if you hold a connection checked out from the pool for a duration greater than the timeout value, the timeout will not work. (it should be noted that it is generally bad practice for a web application to hold a single connection open globally; Connection objects should be obtained via connect() and closed/removed from scope as needed).
Note: With MySQLdb specifically, this error was also occuring due to sloppy re-use of connections within SA's connection pool. As of SA 0.3.3 these issues have been fixed.
MetaData / Schema
I am using autoload=True and <insert your problem here>
The autoload feature, which is also known as table reflection, is by far the most troublesome feature of SA not because SA has skimped on it, but because it encompasses the widest number of variables, including not only the specific quirks of a particular schema, but also the myriad quirks/limitations/"features" of the database, the platform it runs on, the DBAPI driver, and the specific versions and configurations of both. Lots of scenarios simply are not possible in certain scenarios, such as case-sensitive schemas, foreign key reflection, etc.
Before reporting an issue with reflection, please read fully the notes for your particular database over on the DatabaseNotes page to see if its a known issue.
It should be noted that no application really needs reflection in order to function....so while a reflection issue is frustrating, keep in mind that to work around it, just define the Table and Column objects explicitly...which is how many other ORMs such as Hibernate require you to work anyway.
My program is hanging when I say table.drop()/metadata.drop_all()
This usually corresponds to two conditions: 1. using postgres, which is really strict about table locks, and 2. you have a connection still open which contains locks on the table and is distinct from the connection being used for the DROP statement. Heres the most minimal version of the pattern:
result = mytable.select() mytable.drop()
the "result" from mytable.select() still holds onto a connection, which was pulled from the connection pool. Saying mytable.drop() then pulls a second connection from the pool, and hangs because postgres still has open table locks in the first connection.
Two solutions:
1. close the result. this will return the underlying connection to the connection pool, which issues an unconditional ROLLBACK statement on all returned connections, thus releasing any locks.
result = mytable.select() result.close() mytable.drop()
2. use the same connection for both operations
conn = engine.connect() result = conn.execute(mytable.select()) mytable.drop(connectable=conn)
Does SQLAlchemy support ALTER TABLE, CREATE VIEW, CREATE TRIGGER, Schema Upgrade Functionality ?
The only instance where SA automatically renders ALTER TABLE is when you use the use_alter flag on a ForeignKey construct, which is used to create tables with mutually dependent foreign keys.
Everything else related to schemas and things beyond basic table creates and drops, such as CREATE SCHEMA, CREATE VIEW, CREATE PROCEDURE, aren't "built in", but as of 0.4.3 the DDL() construct is ideal for specifying statements of this nature, and can be attached to Table or MetaData objects to be executed at specific times during create or drop calls.
The Migrate project at http://code.google.com/p/sqlalchemy-migrate/ also seeks to provide a full suite of schema migration utilities.
SQLAlchemy gets confused when i try to drop arbitrary database objects that have dependencies on other things it doesn't know about
Like the previous question, SA is not a comprehensive DDL management solution and does not intend to be (however if you were to write a DDL management solution in Python, SQLAlchemy would be a great foundation to start with). It does not have the ability to magically figure out everything there is to know about a database, nor can it strategize the creation/dropping of objects beyond its basic topological sort of a set of Table objects within a single MetaData.
How can I sort Table objects in order of their dependency?
metadata = MetaData() # ... add Table objects to metadata ti = metadata.table_iterator() for t in ti: print t
As of the sort_tables function is available individually:
from sqlalchemy.sql.util import sort_tables for table in sort_tables([t1, t2, t3], reverse=True): print table.description
How can I get the CREATE TABLE/ DROP TABLE output as a string ?
The "mock" engine strategy supports sending a callable which will intercept all execute calls.
from sqlalchemy import * from StringIO import StringIO buf = StringIO() engine = create_engine('postgres://', strategy='mock', executor=lambda s, p='': buf.write(s + p)) meta = MetaData() t1 = Table('sometable', meta, Column('foo', String(30))) meta.create_all(engine) print buf.getvalue()
How do I map a column that is a Python reserved word or already used by SA?
table_a = Table ('tbl_a', metadata, autoload=True)
mapper(AObj, table_a, properties={'type_col' : table_a.c.type, 'pass_col':table_a.c['pass']})
SQL/Transactions
I am using multiple connections with a SQLite database (typically to test transaction operation), and my test program is not working !
SQLite by default uses SingletonThreadPool to manage connections. This means that within a single thread, the connection pool will return the same connection instance in all cases. There are several reasons for this. The best one is that when using a memory: database its required, since a new connection to memory: gives a new database. Another is that with file-based databases theres usually not much reason you'd want multiple connections to the same file within a single thread, and some SQLite versions can deadlock since the database file is locked on write operations (second connection tries to execute in the same thread -> deadlock). A third reason is that many versions of SQLite also do not allow connection instances to be moved between threads (often due to the non-threadsafety of flock()), hence the regular multithreaded connection pool will fail.
Since SQLite changes frequently, there may be options/switches to enable regular multithreaded communication with a particular SQLite file-based database. If this is your case, you can instruct the create_engine call to use the normal connection pool as follows:
from sqlalchemy import pool
e = create_engine('sqlite:///foo.db', poolclass=pool.QueuePool)
SQLite: SQL logic error or missing database
related to the previous FAQ entry, this error occurs when results still remain on a select statement, the ResultProxy from that statement has not been closed, and you have tried to issue another command (like an INSERT) to the SQLite database. It goes to use the same connection (since only one connection per thread) and fails. Ensure that all previous result sets are fully fetched and/or the results are closed using close().
How do I get at the raw DBAPI connection when using an Engine ?
With a regular SA engine-level Connection, you can get at the psycopg2 connection via conn.connection. the connection and cursor returned by cursor() are proxied for garbage collection/connection pool purposes but call upon the underlying DBAPI API for their behavior and methods:
engine = create_engine(...) conn = engine.connect() conn.connection.<do DBAPI things> cursor = conn.connection.cursor(<DBAPI specific arguments..>)
you also want to make sure you revert any isolation level settings on the connection back to normal before returning it to the pool.
or, as an alternative to that, you can call the detach() method on either Connection or the proxied connection, which will remove it from the pool; it gets closed for real when you close() it and whatever changes you've made on it don't pollute the overall pool of connections:
conn = engine.connect() conn.detach() # detaches the DBAPI connection from the connection pool conn.connection.<go nuts> conn.close() # connection is closed for real, the pool replaces it with a new connection
Object Relational Mapping
How do I attach an aggregate column (or other SQL expression) to an ORM Query? i.e. max(), count(), x*y etc.
At the query level the column can be added using add_column(), and then adding in the appropriate join and group by criterion:
session.query(MyClass).group_by([c for c in MyClass.c]).add_column(func.count(MyClass.c.whatever).label( 'count')).list()
Additional criterion can be added using generative functions like join(), filter(), filter_by(), etc. Such as if the aggregate function were against a different table, these functions would be used to add the appropriate join criterion to the second table.
The returned result will be tuples consisting of instances and scalars, i.e.
[ (<MyClass instance>, 5), (<MyClass instance>, 12) ]
Other ways to add aggregates include:
- creating the mapper() against the desired select()
q = select([mytable, func.count(mytable.c.somecolumn)], group_by=[c for c in mytable.c if c is not mytable.c.somecolumn]) mapper(MyClass, q)
- adding the expression using column_property():
mapper(MyClass, mytable, properties={ 'count':column_property(select([func.count(othertable.c.related_id)], mytable.c.id==othertable.c.related_id, scalar=True).label('count')), 'product':column_property((mytable.c.somecol * 5).label('product')) })
- using the desired select() directly with Query:
q = select([mytable, func.count(somecolumn)], group_by=[c for c in mytable.c]) result = session.query(MyClass).select(q) # or result = session.query(MyClass).instances(q.execute())
Of course, if the aggregate value need not be queried inline with the main mapped query, you can just initiate a regular SQL query for the count value (which can be assembled into a class property easy access).
FlushError: instance <someinstance> is an unsaved, pending instance and is an orphan
This corresponds to the usage of the delete-orphan cascade rule. This rule establishes the "parent" object as required for a particular relationship, such as:
mapper(Parent, sometable, properties={
relation(Child, cascade="all, delete-orphan")
})
With the above relationship, no newly created instance of Child can be flushed without being attached to a Parent.
The error occurs when you do specifically this:
c = Child() session.save(c) session.flush()
I.e., a new Child was created and saved to the session. This establishes its state as "pending". The operation is invalid because "c" cannot be inserted (its an orphan), but also cannot be deleted (its not saved). Since it doesn't make any sense to save() a Child instance without a parent, the error is raised (SQLAlchemy does not want to guess if a strange user operation is intentional or not).
I have a schema where my table doesnt have a primary key, can SA's ORM handle it?
jonathan ellis pointed us all to a great series of articles concerning primary keys:
http://spyced.blogspot.com/2006/07/single-column-primary-keys-should-be.html
the gist of the articles is that *any* concept you are trying to store in a database usually has some kind of unique identifying key.
if you dont have any columns explicitly defined as part of a primary key in your table schema (which you probably should), you can still tell a mapper what columns compose the primary key like this:
mapper(someclass, sometable, primary_key=[sometable.c.uid,sometable.c.bar])
a single object's primary key can change, can SA's ORM handle it?
Yes ! As of version 0.4.2 primary keys on entities can change without issue. After a flush the instance's "identity_key" is updated to reflect the change, and the instance is moved in the identity map to the new identity.
What needs to be kept in mind is that references to related objects, who have a foreign key referencing the primary key which is changing, need to be updated as well. SQLAlchemy will issue the UPDATE statements if the passive_updates=False flag is set on a relation(). However, this flag is not appropriate if the database is actually enforcing referential integrity; in that case, ON DELETE CASCADE must be configured on the database tables themselves so that the database can cascade the update, this includes databases like Postgres, Oracle, and MySQL when InnoDB tables are used.
why isnt my __init__ called when I load objects?
SQLAlchemy instantiates objects that are loaded from the database via the __new__() method, which bypasses the __init__() method. This allows objects to follow the semantics of being stored and later reloaded into memory.
Typically, initialization that takes place in an object's constructor is to set up internal state variables for the first time. It is usually the case that these internal state values are part of the state which is persisted to the database; therefore when the object is loaded again, it would be inappropriate to re-initialize this state since its being loaded from the database !
For example, an Order object used in an ecommerce application:
class Order(object):
def __init__(self, user, items):
self.user = user
self.items = [Item(name) for name in items]
myorder = Order(user, items=['item 1', 'item 2'])
session.save(myorder)
session.flush()
Above, the user and items objects are part of the state which is saved to the database. Additionally, the Order constructor created new Item objects upon construction, which automatically get saved to the session using cascade rules. Its clear that the above Order object would fail badly when reconstructed from a database row, since no arguments would be available. If __init__() were expected to be called with no arguments, constructors would be forced to not have any required arguments and would have to check all incoming arguments for None, which actually was the case in early versions of SA. But also, the Order would have no indication if it were an already-saved instance or a brand new one.
Another way to view it, is that calling __new__ is more equivalent to your objects being "pickled" and restored from their "pickled" state.
In general, having loaded objects created via __new__() alone instead of __init__() allows objects to differentiate between construction and loading.
if you need customized things to be done when the instance is loaded back from the database, the idea is that youd make a MapperExtension with create_instance() overridden, which returns a new instance with whatever else you wanted (i.e. it can just create the object normally):
class MyExt(MapperExtension):
def create_instance(self, mapper, selectcontext, row, class_):
return MyObject()
mapper(MyObject, table, extension=MyExt())
how do I use ON DELETE CASCADE with SA's ORM ?
you should be able to have ON DELETE CASCADE set up in your database normally. however, you have to make sure that the mappers you set up contain the relationships that correspond to those cascade rules. What will happen then is if you mark an object instance as deleted and then flush, SQLAlchemy will also delete all child objects that have been set up with the "delete" cascade rule. In this scenario, SA will do the deletions in the correct order before the database-level CASCADE rule ever gets to it.
SQLAlchemy will either "actively" load in child instances during a cascading delete operation, or "passively" process only those child instances which were already loaded into the session. This setting is controlled by the passive_deletes boolean flag on the relation() function, which defaults to False, indicating that it will "actively" load in child instances for deletion.
Setting this flag to True when ON DELETE CASCADE is configured on the database may save a lot of extra load operations, particularly for large tables.
I've created a mapper against an Outer Join, and while the query returns rows, no objects are returned. Why not ?
The ORM Query object will not return an instance if the row does not contain a full primary key. For an outerjoin such as books_table.outerjoin(fulltext_table), the natural primary key of this join would be the primary key column(s) of books_table + the primary key column(s) of fulltext_table. Such as:
[books_table.c.id, fulltext_table.c.id]
When mapping to a join like this, you have to make the determination what the "primary key" of a particular row from your outerjoin should be. By default, it will be the set of primary key columns from all involved tables, "reduced" down to the minimal set of unique columns among them, as is the case above.
If you decide that the books_table alone determines primary key, then you can override this default using the primary_key argument:
mapper(MyClass, books_table.outerjoin(fulltext_table), primary_key=[books_table.c.id])
If, on the other hand, the combination of columns from both tables do comprise the primary key, since we are mapping to an outerjoin which may contain NULLs for the outer joined table, the mapper needs to be told that the composite primary key may contain a NULL for one or more (but not all) of the columns, but is still a valid primary key. For that you add the flag allow_null_pks=True to your mapper:
mapper(MyClass, books_table.outerjoin(fulltext_table), allow_null_pks=True)
I'm using "lazy=False" to create a JOIN/OUTER JOIN and SQLAlchemy is not constructing the query when I try to add a WHERE, ORDER BY, LIMIT, etc. (which relies upon the (OUTER) JOIN)
If the query you are constructing requires joining to a second table, that is not what eager loading is used for. You instead need to tell your Query object to join explicitly. Pretend the eager loading is not there !
SA's eager loading capability is designed to load full sets of child objects on a parent object, in such a way that it is completely transparent if lazy or eager loading is used. To accomplish this, it uses aliases so that the extra tables are "anonymized" against the normal query criterion, which is why explicit query criterion does not affect them. The goal is that whether or not eager loading is used, the same Query criterion will produce the *identical* result.
For information on how to set up explicit joins, see the tutorial on "Querying with Joins" at http://www.sqlalchemy.org/docs/04/ormtutorial.html#datamapping_joins.
I set the "foo_id" attribute on my instance to "7", but the "foo" attribute is still None - shouldn't it have loaded Foo with id #7 ?
SQLAlchemy doesn't currently tie event handlers to foreign key-holding object attributes. While this is something we may try in a future release, traditionally SA's usage model has focused on dealing with object instances and their direct association with each other in Python, rather than associating them through manual manipulation of foreign key attributes. Particularly with objects that aren't even persisted in the database yet (i.e. transient), we don't like to populate attributes "automagically" based on some other event, with the one notable exception being backreferences via the backref flag.
For this particular behavior, the foreign key attribute would "expire" the related item. So setting foo_id on your Bar to 7 means that Bar.foo gets *deleted*, and will load Foo id #7 the next time you access Bar.foo; otherwise setting a whole set of individual attributes would immediately trigger that many individual SELECT statements and be extremely wasteful.
But, what if you had previously set Bar.foo to something already, which conflicted with id #7, do we just wipe it out, or raise an error ? Furthermore, the presence of the reverse collection of bars on Foo #7 is concerning as well...do we expire that collection? What about other Bar instances which may have been newly added to it? Or, do we immediately flush and reload Foo's collection of Bars ? Again the setting of attributes has created a complex set of decisions to make. For reasons like these, the level of surprise behavior and potential for either wasteful or outright erroneous situations seems high, and in the spirit of "don't guess" we've done just that.
So, when you do set a foreign key attribute and you want to load the related items in, just use session.refresh() or session.expire(). If operating upon the whole instance, issue a flush() first. But, as of 0.4.1 these methods also take a list of individual attribute names as arguments, so you can even do it without flushing, by just expiring the individual attribute:
bar.foo_id = 7 session.expire(bar, ['foo']) # expires the 'foo' attribute on `Bar` print "foo #7:", bar.foo # immediately loads Foo #7
How Do I use Textual SQL with ORM Queries ?
Textual blocks can be assembled at any point within Query. Individual components can be sent to filter():
session.query(User).filter("id<:value and name=:name").\ params(value=224, name='fred').one()
and full statements can be used with from_statement():
session.query(User).from_statement("SELECT * FROM users where name=:name").params(name='ed').all()
Why is my PickleType column issuing unnecessary UPDATE statements ?
When pickling a Python datastructure such as dict or set, Python's pickle module does not produce the same picklestring for the same collection each time. The default behavior of PickleType when used in the ORM is to check for changes by comparing the full pickled string, so that it can detect changes within the structure. To disable this behavior and instead use the equals operator, set up the column using PickleType(mutable=False):
Column('mycol', PickleType(mutable=False))
Note that PickleType was using the is operator in this case, previous to version 0.4.1.
Alternatively, any desired comparison function can be installed on PickleType using the comparator argument:
def mycomparator(x, y): return x == y # ... Column('mycol', PickleType(comparator=mycomparator))
Is there a way to automagically have only unique keywords (or other kinds of objects) without doing a query for the keyword and getting a reference to the row containing that keyword?
When people read the many-to-many example in the docs, they get hit with the fact that if you create the same Keyword twice, it gets put in the DB twice. Which is somewhat inconvenient.
This recipe was created to address this issue: UsageRecipes/UniqueObject
Integrating with external tools
py2exe
There is a problem when converting scripts using SQLAlchemy to windows executable with py2exe. Problem and its possible solution was described in this post. And patch for SQLAlchemy trunk (as for 2007-02-08) is in attachment to this post.
