BSON→JSON

bson2json: just a quick & dirty converter, BSON→JSON, and rudimentary schema analysis, for preparing to migrate a MongoDB dump to LevelDB.

BSON

  1. Dump files are just concatenated serialized objects. BSON format is amenable to efficient streaming, so we just iterate over the “records” by piping a standard file/stream into a BSON parser, and stringifying each to JSON:
    fs.createReadStream process.argv[2]
    .pipe new require 'bson-stream'
    .on 'data',(o)->
    	console.log JSON.stringify o
    
  2. (console.log? WTF?)

Schema

  1. Collect (and count) variations in objects’ “schema”, to gain insight into variety of “documents” stored in that particular file.
    schema={}
    

    Mapping of signatures to counts, where signatures are sorted lists of properties stringified to JSON:

    .on 'data',(o)->
    	ks=Object.keys(o).sort()
    	s=JSON.stringify ks
    	schema[s] or=0
    	schema[s]++
    .on 'end',(o)->
    	console.log 'Schema variations:',(Object.keys schema).length,schema
    
  2. Also show a union of all schema variations:
    .on 'data',(o)->
    	#...
    	for k in ks
    		keys[k]=true
    .on 'end',(o)->
    	console.log 'Keys (union):',Object.keys keys
    

EOF and callbacks?

  1. Node guarantees process won’t exit before all queued callbacks execute.
  2. But…?

Multiple files

  1. Out of laziness — quick & dirty, like we said — just looped over all files with a shell script:
    #!/bin/sh
    for f in *.bson
    do ./bson2json.coffee $f > ${f%.bson}.json
    done
    
  2. So files are processed sequentially, synchronously, in command line order (left to right).
  3. In a more invested migration script we’ll iterate over them recursively. Later.


Comments are closed.