Whoosh on Google App Engine
Whoosh was recently patched to work on Google App Engine (GAE), so it does, but we're seeing a small, consistent, performance problem. Search is annoyingly slow.
As always, challenge has unique constraints — "forces" in patterns speak.
- Client isn't complaining, ie, isn't paying to get this fixed, so I'm looking for the quickest possible workaround, even though this is all open source and I would love to contribute a solution. And "time is money", so I'm favoring "guessing" over analysis and proof.
- W/GAE is somewhat esoteric, still labeled "experimental", and I can't find reports confirming this problem from other users.
- GAE's Memcache service was reported to misbehave occasionally, so it's a probable suspect. A bit of a misnomer, really, because it's not local memory, but a (billable) remote service (RPC), but anyway, we're seeing a three orders of magnitude difference in performance. Profiling with Appstats could exonerate/condemn it, but I'm still hoping to guess around this.
- Not sure, but seems our slowness is consistent, which isn't consistent with spikes in Memcache's latencies, suggesting problem's elsewhere.
- We're using Whoosh not for full-text but a faceted search, and wrapping the many metadata fields in MultiValueProperty classes for multilingual support; all these class hierarchy traversals, method resolutions, and loops just smell like trouble. (Whoosh itself isn't (that) slow, with a simple schema?)
- Don't know why, but I can't post to the mailing list, and Google (Groups) is being extraordinarily obnoxious about it, bounding my mail with absurdly unhelpful error messages. There's a little traffic on Stackoverflow, but nothing recent/relevant. Ended up posting on GAE.
W/GAE: is it?
- How does W' store its indices?
- What about the Datastore (NDB) 1MB limit? Blobstore vs Memcache?
To Memcache or not?
- Memcache isn't local, primary storage (RAM): it's a remote service (RPC). (And billable!) Why use Memcache? How faster than Datastore is it really?
- Pros/cons of using instances' RAM as caches, instead?
The real world is a special case