Something that could be done for now, is to write a piece of
mzscheme code that "marshalls" the data in (utf-8-encoded)
byte-strings. Assuming that most of the 2gb is made of
strings, and that these strings are mostly ascii, this
should reduce the consumption by close to a factor of 4.
(I can imagine an interface that is transparent at the arc
level, where are strings are just passed to the backend and
retrieved from it, and the backend converts them to and
from byte strings. Later on it could change to use a FS or
a DB or whatever.)
(I can imagine an interface that is transparent at the arc level, where are strings are just passed to the backend and retrieved from it, and the backend converts them to and from byte strings. Later on it could change to use a FS or a DB or whatever.)