bson2json: just a quick & dirty converter, BSON→JSON, and rudimentary schema analysis, for preparing to migrate a MongoDB dump to LevelDB.
BSON
- Dump files are just concatenated serialized objects. BSON format is amenable to efficient streaming, so we just iterate over the “records” by piping a standard file/stream into a BSON parser, and stringifying each to JSON:
fs.createReadStream process.argv[2] .pipe new require 'bson-stream' .on 'data',(o)-> console.log JSON.stringify o
- (console.log? WTF?)
Schema
- Collect (and count) variations in objects’ “schema”, to gain insight into variety of “documents” stored in that particular file.
schema={}
Mapping of signatures to counts, where signatures are sorted lists of properties stringified to JSON:
.on 'data',(o)-> ks=Object.keys(o).sort() s=JSON.stringify ks schema[s] or=0 schema[s]++ .on 'end',(o)-> console.log 'Schema variations:',(Object.keys schema).length,schema
- Also show a union of all schema variations:
.on 'data',(o)-> #... for k in ks keys[k]=true .on 'end',(o)-> console.log 'Keys (union):',Object.keys keys
EOF and callbacks?
- Node guarantees process won’t exit before all queued callbacks execute.
- But…?
Multiple files
- Out of laziness — quick & dirty, like we said — just looped over all files with a shell script:
#!/bin/sh for f in *.bson do ./bson2json.coffee $f > ${f%.bson}.json done
- So files are processed sequentially, synchronously, in command line order (left to right).
- In a more invested migration script we’ll iterate over them recursively. Later.