Throttled Producers and Consumers in Java


BlockingQueue queue = new ArrayBlockingQueue<>(QUEUE_MAXIMUM_SIZE);

producers:  queue.put(dbObject);

consumers: while (RUNNING) {  DBObject dbObject = queue.poll(1, TimeUnit.SECONDS); ...

When transferring large volumes of data from one type of database (Mongo for example), to another (DynamoDB), the fastest way is to do things in parallel. Reading from one Mongo cursor from one Mongo server in the cluster gives you limited performance. Writing to DynamoDB on one thread maxes out quickly to something like 100/s. I ended up creating many threads to read from Mongo in parallel on defined partitions to utilize all of the shards and replica sets in the cluster. I had these threads put items on the queue. Setting a max size on the BlockingQueue would throttle my reads to not flood the consumers and run out of memory. The consumers would read from the queue and eventually be told to stop when all of the producers were done. This model was very performant and easy to monitor. By monitoring the size of the queue, I could see if the consumers or producers were working faster. With the BlockingQueue, the producers were not allowed to out-pace the consumers. The consumers didn’t have any problems with this though since DynamoDB is way more scalable than Mongo.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s