[storm] Update or add pattern with Storm

Sat Oct 11 00:49:26 BST 2008

On Fri, 2008-10-10 at 11:25 -0700, Jamu Kakar wrote:

[snip]

> I use ResultSet.any to produce the same result as your code snippet
> above, to avoid using COUNT:
> 
> exists = bool(store.find(Customer, Customer.last_name == u"Sparrow").any())

Yes, I hit the tablescan issue with an earlier version of PostgreSQL,
but I believe it's been fixed, now.

> To solve the original poster's issue, where one needs to determine
> if an object exists and then create or update it, I'd use something
> like:
> 
> customer = store.find(Customer, Customer.last_name == u"Sparrow").one()
> if not customer:
>      customer = store.add(Customer(u"Jack", u"Sparrow"))
> else:
>      customer.first_name = u"Jack"
>      customer.last_name = u"Sparrow"
> 
> You need to use a query that will only ever return 0 or 1 row when
> you use ResultSet.one.  If more than 1 row is returned a NotOneError
> will be raised.

This is essentially what I've been doing, so thankyou both for your
advice.

On reflection, I think I actually have several different problems,
rather than just this one. I'm trying to solve too many problems with
the same code, when they're actually different and need different code.

In a nutshell, I have a set of many thousands of objects (10-15k+) in
the database. I may only be discovering information about a small
percentage of them, or possibly all of them. The discovered information
may be a subset, overlap with, or be a superset of the information in
the database.

In some cases, I want to detect that information in the database is
outdated and should be removed, in which case the new information takes
precedence. To cater for that case, the code runs checks against all the
objects in the database, and if the discovered information doesn't
contain the object, it's flagged as old and removed.

But most of the time I'm either discovering new information, or updating
existing information. Scanning the whole list of objects in the database
for these cases is kinda dumb; I should just be searching for the
specific objects I've discovered.

So I think what I really want is a 'purge' flag that turns on the full
object scan only when needed. Then I can make the most common case more
efficient by only processing the new information and leaving most of the
database alone.

-- 
Justin Warren <daedalus at eigenmagic.com>