Notes
Introduction
This article describes caveats of Collectors#toMap, which is used to convert a Stream to a Map.
Following the trap of Stream#toList(), this is another post about how to use Stream.
Stream is convenient, but it feels like a tragic monster trapped in Java's fences.
Compared with libraries in other languages that provide similar functionality, isn't the overall API rather awkward?
I started writing this article and then looked up existing blog posts, only to find that this topic is, unsurprisingly, already mentioned elsewhere.
TL;DR
Collectors#toMapwith two arguments throws an exception if there are duplicate keys- To avoid exceptions on duplicate keys, specify a merge function as the third argument
Collectors#toMap
Collectors#toMap is a Collector that transforms Stream elements into key-value pairs.
Collectors#toMap has three overloaded methods with different numbers of arguments, and you must choose appropriately.
Collectors#toMap with two arguments
The signature of Collectors#toMap with two arguments is as follows.
1public static <T, K, U> Collector<T, ?, Map<K,U>> toMap(2Function<? super T, ? extends K> keyMapper,3Function<? super T, ? extends U> valueMapper4)
In this method, keyMapper provides the key and valueMapper provides the value.
A typical usage looks like this.
1jshell> Stream.of(2...> Map.entry("Alice", 1995),3...> Map.entry("Bob", 1972),4...> Map.entry("Charile", 1991)5...> ).collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue))6$1 ==> {Bob=1972, Alice=1995, Charile=1991}
To keep the example simple, it uses Map.Entry as the stream elements, but in real code you often create a map from some field of a class.
This example appears to work, but Collectors#toMap with two arguments throws an exception when keys are duplicated 🙄
If there are two Alices, you get the following exception.
1jshell> Stream.of(2...> Map.entry("Alice", 1995),3...> Map.entry("Bob", 1972),の4...> Map.entry("Charile", 1991),5...> Map.entry("Alice", 1992)6...> ).collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue))7...>8| 例外java.lang.IllegalStateException: Duplicate key Alice (attempted merging values 1995 and 1992)9| at Collectors.duplicateKeyException (Collectors.java:135)10| at Collectors.lambda$uniqKeysMapAccumulator$1 (Collectors.java:182)11| at ReduceOps$3ReducingSink.accept (ReduceOps.java:169)12| at Spliterators$ArraySpliterator.forEachRemaining (Spliterators.java:992)13| at AbstractPipeline.copyInto (AbstractPipeline.java:509)14| at AbstractPipeline.wrapAndCopyInto (AbstractPipeline.java:499)15| at ReduceOps$ReduceOp.evaluateSequential (ReduceOps.java:921)16| at AbstractPipeline.evaluate (AbstractPipeline.java:234)17| at ReferencePipeline.collect (ReferencePipeline.java:682)18| at (#3:6)
If you have little experience with Stream programming, or if you are not used to thinking about behavior on key collisions when dealing with maps,
you might not handle this pattern and end up with bugs.
It may work fine until one day data with duplicate keys appears and suddenly a runtime error occurs. That would be disastrous.
I wish Java would avoid providing APIs that casually throw runtime exceptions.
It might be more beginner-friendly to only provide the three-argument Collection#toMap instead.
Information leakage via exception messages
Java's exception message helpfully states that it attempted to merge values 1995 and 1992 for the key Alice.
However, APIs that output such messages must be handled carefully.
What if the keys or values of the map you are generating contain personal information?
Keys and values could appear in logs and lead to information leakage.
You can suppress exception messages, but then debugging becomes harder when a bug occurs.
In this example, the stack trace alone is enough to identify the cause, but it becomes difficult when exceptions occur in external libraries.
As introduced in ERR01-J: Do not leak sensitive information outside of an application in exception messages, checked exceptions help you notice the risk of exposing credentials during implementation, but it is very troublesome that unchecked exceptions thrown by standard APIs include object values in their messages.
Collectors#toMap with three arguments
The signature of Collectors#toMap with three arguments is as follows.
1public static <T, K, U> Collector<T,?,Map<K,U>> toMap(2Function<? super T,? extends K> keyMapper,3Function<? super T,? extends U> valueMapper,4BinaryOperator<U> mergeFunction5)
Compared to the two-argument version, a third argument BinaryOperator<U> mergeFunction is added.
mergeFunction takes the existing value and the new value when a key collides, and returns the value to register.
If you want to prioritize the later Alice in the earlier example, you can write it like this.
1jshell> Stream.of(2...> Map.entry("Alice", 1995),3...> Map.entry("Bob", 1972),4...> Map.entry("Charile", 1991),5...> Map.entry("Alice", 1992)6...> ).collect(Collectors.toMap(7...> Map.Entry::getKey,8...> Map.Entry::getValue,9...> (existing, replacement) -> replacement10...> ))11$2 ==> {Bob=1972, Alice=1992, Charile=1991}
You can see that Alice now maps to 1992.
Since mergeFunction is just a BinaryFunction, you can also do the following.
1jshell> Stream.of(2...> Map.entry("Alice", 1995),3...> Map.entry("Bob", 1972),4...> Map.entry("Charile", 1991),5...> Map.entry("Alice", 1992)6...> ).collect(Collectors.toMap(7...> Map.Entry::getKey,8...> (entry) -> entry.getValue().toString(),9...> (existing, replacement) -> existing + "," + replacement10...> ))11$3 ==> {Bob=1972, Alice=1995,1992, Charile=1991}
In this example, the value of Map.entry is converted from Integer to String,
creating a Map<String, String>.
The mergeFunction concatenates the existing and new values with a comma when the key is duplicated.
With the three-argument Collections#toMap, you can build maps that properly handle duplicate keys.
Collectors#toMap with four arguments
Finally, let's check the signature of Collectors#toMap with four arguments. You probably won't use this often.
1public static <T, K, U, M extends Map<K, U>> Collector<T,?,M> toMap(2Function<? super T,? extends K> keyMapper,3Function<? super T,? extends U> valueMapper,4BinaryOperator<U> mergeFunction,5Supplier<M> mapFactory6)
The added fourth argument mapFactory is a function that creates the Map instance to store the results.
Typically you pass something like HashMap::new to create a mutable map.
1jshell> Stream.of(2...> Map.entry("Alice", 1995),3...> Map.entry("Bob", 1972),4...> Map.entry("Charile", 1991),5...> Map.entry("Alice", 1992)6...> ).collect(Collectors.toMap(7...> Map.Entry::getKey,8...> (entry) -> entry.getValue().toString(),9...> (existing, replacement) -> existing + "," + replacement,10...> HashMap::new11...> ))12$4 ==> {Bob=1972, Alice=1995,1992, Charile=1991}
The documentation says it is a supplier of an empty map, but it doesn't have to be empty.
1jshell> Stream.of(2...> Map.entry("Alice", 1995),3...> Map.entry("Bob", 1972),4...> Map.entry("Charile", 1991),5...> Map.entry("Alice", 1992)6...> ).collect(Collectors.toMap(7...> Map.Entry::getKey,8...> (entry) -> entry.getValue().toString(),9...> (existing, replacement) -> existing + "," + replacement,10...> () -> {11...> var map = new HashMap<String, String>();12...> map.put("Dave", "2000");13...> return map;14...> }15...> ))16$7 ==> {Bob=1972, Alice=1995,1992, Charile=1991, Dave=2000}
You probably won't use it like this, but you can start with an initialized map and add data to it.
What if I want duplicates as a List?
However, if you've understood Collectors#toMap behavior from this article, you'll realize you can implement it with Collectors#toMap as well.
For example, by writing the following for valueMapper and mergeFunction, you can get the same result.
1jshell> Stream.of(2...> Map.entry("Alice", 1995),3...> Map.entry("Bob", 1972),4...> Map.entry("Charile", 1991),5...> Map.entry("Alice", 1992)6...> ).collect(Collectors.toMap(7...> Map.Entry::getKey,8...> (entry) -> {9...> var list = new ArrayList<Integer>();10...> list.add(entry.getValue());11...>12...> return list;13...> },14...> (existing, replacement) -> {15...> existing.addAll(replacement);16...>17...> return existing;18...> }19...> ))20$8 ==> {Bob=[1972], Alice=[1995, 1992], Charile=[1991]}
But it is tedious to write this every time, so if you want to collect all duplicates into another collection, use Collectors#groupingBy.
Summary
I covered the caveats of Collectors#toMap and the typical scenarios where it is used.
As you should understand by now, when using Collectors#toMap you should assume you always need the third argument.
When creating a Map from a Stream, the thought process should be:
- The map key is
___, so transform stream elements like___ - The map value is
___, so transform stream elements like___ - If keys are duplicated, adopt the result of
___
Only after you conclude that "the map key will never be duplicated" should you remove the third argument. If we are going to force programmers to think about this, why not avoid defining the two-argument method entirely and instead provide something like:
1public class ToMap {2public static BinaryOperator<T> neverDuplicateKey(T existing, T replacement) {3throw new IllegalStateException(...);4}5}
and use it like:
1Stream.of(2Map.entry("Alice", 1995),3Map.entry("Bob", 1972),4Map.entry("Charile", 1991)5).collect(Collectors.toMap(6Map.Entry::getKey,7Map.Entry::getValue,8ToMap::neverDuplicateKey9))
It sometimes feels like Java's collections are designed to make it easy to write code that can fail at runtime.
Enough complaining about Java. I hope this article helps you avoid runtime exceptions in code using Collectors#toMap.

