BSON→JSON
bson2json: just a quick & dirty converter, BSON→JSON, and rudimentary schema analysis, for preparing to migrate a MongoDB dump to LevelDB.
BSON
- Dump files are just concatenated serialized objects. BSON format is amenable to efficient streaming, so we just iterate over the "records" by piping a standard file/stream into a BSON parser, and stringifying each to JSON:
fs.createReadStream process.argv[2] # CLI: expect path. .pipe new require 'bson-stream' .on 'data',(o)-> console.log JSON.stringify o
Schema
- Collect (and count) variations in object shapes, to gain insight into variety of "documents" stored in that particular file.
schema={}
Mapping signatures to counts, where signatures are sorted lists of properties' keys, stringified to JSON:.on 'data',(o)-> s=JSON.stringify (Object.keys o).sort() schema[s] or=0 schema[s]++ .on 'end',(o)-> console.log 'Schema variations:',(Object.keys schema).length,schema
- Also show a union of all schema variations:
.on 'data',(o)-> … for k in ks used[k]=true # (JS Set API sucks.) .on 'end',(o)-> console.log 'Keys used:',Object.keys used
EOF and callbacks?
- Node guarantees process won't exit before all queued callbacks execute.
- But…?
Multiple files
- Out of laziness — we said quick & dirty — just looped over all files with a shell script:
#!/bin/sh for f in *.bson do ./bson2json.coffee "$f" > "${f%.bson}.json" done
- So files are processed sequentially, synchronously, in command line order (however Bash expands).
- In a more invested migration script we'll iterate over them recursively.
--
The real world is a special case