bson2json: just a quick & dirty converter, BSON→JSON, and rudimentary schema analysis, for preparing to migrate a MongoDB dump to LevelDB.


  1. Dump files are just concatenated serialized objects. BSON format is amenable to efficient streaming, so we just iterate over the "records" by piping a standard file/stream into a BSON parser, and stringifying each to JSON:
    fs.createReadStream process.argv[2] # CLI: expect path.
    	.pipe new require 'bson-stream'
    	.on 'data',(o)->
    		console.log JSON.stringify o


  1. Collect (and count) variations in object shapes, to gain insight into variety of "documents" stored in that particular file.
    Mapping signatures to counts, where signatures are sorted lists of properties' keys, stringified to JSON:
    .on 'data',(o)->
    	s=JSON.stringify (Object.keys o).sort()
    	schema[s] or=0
    .on 'end',(o)->
    	console.log 'Schema variations:',(Object.keys schema).length,schema
  2. Also show a union of all schema variations:
    .on 'data',(o)->
    	for k in ks
    		used[k]=true # (JS Set API sucks.)
    .on 'end',(o)->
    	console.log 'Keys used:',Object.keys used

EOF and callbacks?

  1. Node guarantees process won't exit before all queued callbacks execute.
  2. But…?

Multiple files

  1. Out of laziness — we said quick & dirty — just looped over all files with a shell script:
    for f in *.bson
    do ./ "$f" > "${f%.bson}.json"
  2. So files are processed sequentially, synchronously, in command line order (however Bash expands).
  3. In a more invested migration script we'll iterate over them recursively.

The real world is a special case