Wanted: Memory conservative key-value store
I would really like to find a key value store that is memory conservative. What we currently have is like a souped up version of the dumbdbm standard library module, but it has a cache budget and as it loads in new values flushes older ones to make room. However, as the amount of data managed increases so does the amount of key metadata indicating whereabouts the values lie on disk. So now the next step is to either add some form of caching for key metadata, or find a suitable free open source solution.
Does anyone know of a suitable one that is not constrained by the GPL? It doesn't have to be Python, but Python bindings are a bonus.
Considering Sqlite
When thinking of low memory database solutions, Sqlite is one that comes to mind, and even better it comes as part of the Python distribution these days. And even betterer, there's a custom port for my uncommon platform of choice. And even.. bettererer it has an IO abstraction layer that allows it to work with custom IO solutions with minimal additional work. Additionally, reading the spiel makes it sound appealing memory-wise:
SQLite is a compact library. With all features enabled, the library size can be less than 300KiB, depending on compiler optimization settings. (Some compiler optimizations such as aggressive function inlining and loop unrolling can cause the object code to be much larger.) If optional features are omitted, the size of the SQLite library can be reduced below 180KiB. SQLite can also be made to run in minimal stack space (4KiB) and very little heap (100KiB), making SQLite a popular database engine choice on memory constrained gadgets such as cellphones, PDAs, and MP3 players. There is a tradeoff between memory usage and speed. SQLite generally runs faster the more memory you give it. Nevertheless, performance is usually quite good even in low-memory environments.But you know what? I am as yet unable to get it down to 180KiB no matter how many features I compile out of it using the handy SQLITE_OMIT... options. And not all options can be omitted if I want to use pysqlite, as it does not suit the maintainer to support them.
Here's a clipped table of the code sizes for various cross-compilations:
Overall | libsqlite3.a | libpysqlite3.a | Description | |
1 | | 0 | 0 | Without sqlite |
2 | 487536 | 425060 | 49860 | Sqlite with optimise for size |
3 | 365212 | 308730 | 46740 | Sqlite with optimise for size + code omissions |
4 | 536608 | 472290 | 50560 | Sqlite with full optimise |
5 | 402492 | 344440 | 47430 | Sqlite with full optimise + code omissions |
In the Windows Python 2.7 installation _sqlite3.pyd is 48KB and sqlite3.dll is 417 KB. So the sizes above, are still comparatively above that expecting both to be done with no omissions and full optimisation. But more or less close enough.
Considering home grown
Any third party solution would need to be adapted to deal with the custom IO needs, unless it was written in pure Python. At this point, the simplest solution is just to extend what I already have.
Edit: Just a note, the key desired feature is memory management. It should be possible to put hard constraints on the amount of memory it uses, both for the cached records read from disk, and for the lookup information that maps keys to location of records on disk. Most key value stores I have looked at either claim to keep all keys in memory as a feature, or just keep them all in memory because it is the simple thing to do.