Whoosh on Google App Engine

Whoosh was recently patched to work on Google App Engine (GAE), so it does, but we’re seeing a small, consistent, performance problem. Search is annoyingly slow.

As always, this challenge has unique constraints — “forces” in patterns speak.

  1. Client isn’t complaining, ie, isn’t paying to get this fixed, so I’m looking for the quickest possible workaround, even though this is all open source and I would love to contribute a solution. And “time is money”, so I’m favoring “guessing” over analysis and proof.
  2. W/GAE is somewhat esoteric, still labeled “experimental”, and I can’t find reports confirming this problem from other users.
  3. GAE’s Memcache service was reported to misbehave occasionally, so it’s a probable suspect. A bit of a misnomer, really, because it’s not local memory, but a (billable) remote service (RPC), but anyway, we’re seeing a three orders of magnitude difference in performance. Profiling with Appstats could exonerate/condemn it, but I’m still hoping to guess around this.
  4. Not sure, but seems our slowness is consistent, which isn’t consistent with spikes in Memcache’s latencies, suggesting problem’s elsewhere.
  5. We’re using Whoosh not for full-text but a faceted search, and wrapping the many metadata fields in MultiValueProperty classes for multilingual support; all these class hierarchy traversals, method resolutions, and loops just smell like trouble. (Whoosh itself isn’t (that) slow, with a simple schema?)
  6. Don’t know why, but I can’t post to the mailing list, and Google (Groups) is being extraordinarily obnoxious about it, bounding my   mail with absurdly unhelpful error messages. There’s a little traffic on Stackoverflow, but nothing recent/relevant. Ended up posting on GAE.

W/GAE: is it?

  1. How does W’ store its indices?
  2. What about the Datastore (NDB) 1MB limit? Blobstore vs Memcache?

To Memcache or not?

  1. Memcache isn’t local, primary storage (RAM): it’s a remote service (RPC). (And billable!) Why use Memcache? How faster than Datastore is it really?
  2. Pros/cons of using instances’ RAM as caches, instead?



Comments are closed.