Parallel Map in Java (From Kotlin)

written in collections, java, kotlin, parallel

Following up of my previous post, I was curious how a parallel map operation would look like using Java’s parallelStream. Here’s what I find out.

In Java to use map you do:

import java.util.stream.Collectors //sampleStart fun main(args: Array<String>) { val output = (1..100).toList() .stream() .map { it * 2 } .collect(Collectors.toList()) println(output) } //sampleEnd

(In case you’re wondering I’m using Java collections from Kotlin)

And to do a parallel map you can simply do:

import java.util.stream.Collectors //sampleStart fun main(args: Array<String>) { val output = (1..100).toList() .parallelStream() .map { it * 2 } .collect(Collectors.toList()) println(output) } //sampleEnd

No need to write a special pmap operation like we did for Kotlin. Just call parallelStream and that’s it. Pretty cool, right?

I was curious about how this solution compared to the one on my previous post, so I decided to time it too.

import java.util.stream.Collectors import kotlin.system.measureTimeMillis //sampleStart fun main(args: Array<String>) { val time = measureTimeMillis { val output = (1..100).toList() .parallelStream() .map { Thread.sleep(100) it * 2 } .collect(Collectors.toList()) println(output) } println("Total time: $time") } //sampleEnd

In this case instead I’m actually setting a delay of 100 milliseconds (instead of 1,000 like I did on my previous post)1. I was expecting the total time to be something close to 100 milliseconds, just like it was for the Kotlin pmap, instead I got something close to 5,000.

Turns out parallelStream uses the default ForkJoinPool.commonPool which by default has a parallelism level equal to the number of available processors. In this case 2 processors: 100 operations * 100 milliseconds / 2 processors = 5000 milliseconds. You can check the number of available processors simply by adding this line to the script:

println(Runtime.getRuntime().availableProcessors())

But, I want more parallelism!

What if we want to increase the parallelism level? There are 2 ways to achieve this.

The first one is to make our code run in a custom thread pool of our choice. Unfortunately Java doesn’t make it easy to provide a custom thread pool, but the workaround is not so bad either.

The other option is to change the ForkJoinPool.commonPool parallelism level by system property like this:

System.setProperty("java.util.concurrent.ForkJoinPool.common.parallelism", "10")

Unfortunately this doesn’t work on Kotlin Playground so you’ll have to try it on your own machine or take my word that it works.

It’s worth noting that with the second approach you’d still be using the same default thread pool shared globally across the app. As you can imagine this can be EXTREMELY BAD as you’d be basically depleting resources for the whole application. Some would even argue this is reason enough not to use parallelStream at all. Although that seems a little extreme if you ask me.


  1. Otherwise the execution takes too long and doesn’t complete. Probably a limitation of Kotlin Playground


Comments