Marshall and Unmarhsall JSON is the bread and butter of ETL. There has been much library that gives ways to help the data wrangling process much seamless. One of these JSON libraries that gain much popularity lately is Circe. It is a library that creates a shapeless dependency that has an automatic deserialization function that serializes JSON string to a domain model. However, these JSON libraries also come with a catch. For example, you keep encountering decode failure if you don’t know what you need to use the prepare
method to handle class with default on the non-optional field. Decoding nested arrays and objects is hard if you don’t know some vital work around the libraries. You encounter a lot of questions, such as what is the difference between auto and semi-auto, and when do you need to use one vs. the other?
These questions can sometimes be daunting. I spent days and weeks, scratching my head, pulling my hair, understand and learn all the best practices in using this JSON library – I concluded 6 simple tips that I learn that can save you tons of time in using Circe in your future Scala projects.
Setup
Before we started, let’s setup up the environment. If you haven’t set up SBT, you can look at the documentation here. Paste the dependency on your build.sbt
file:
val circeVersion = "0.11.1"
libraryDependencies ++= Seq(
"io.circe" %% "circe-core",
"io.circe" %% "circe-generic",
"io.circe" %% "circe-parser"
).map(_ % circeVersion)
Let’s get down to 6 tips and the examples on ways to decode JSON in Scala:
Decoding Objects
Define the domain model for decoding the JSON value by defining an implicit encode and decode value with Circe.
Noticed the implicit Val
in objects? When you define an implicit value in the companion objects (implicit scope), you don’t need to introduce the import tax when you call the decode[A]
value.
Decoding Arrays
When you decode a list of arrays, set the case class to a single object. Then, when you invoke decode parser, you need to specify what data structure is holding that JSON value – in this case, it is a List
: parser.decode[List[Book]]
.
Decoding Automatically
Circe has an auto decoder feature that automatically decodes a string of JSON to the domain model for you. You need to import io.generic.auto._
Decoding Manually
What do we learn here?
- You can write your decode with
HCursor
. - If you use the above syntax by calling anonymous function
(hcursor:HCursor)
, then you need to cast the return value to aDecoder[A]
. .get[A](key)
with.downfield("key").as[A]
is the same thing.- If you have multiple nested JSON, specifying the type as the type that you want to decode, Circe recursively decodes the next value for you.
Decoding an Arrays of Objects of Arrays
Having multiple nested JSON String can be tricky and hard to decode. However, with a little bit of help from Cats, you can retrieve values from an array of objects of arrays.
Assume there is an incoming JSON string like this:
[
{ "name":"productResource", orderItems: [ { voucher: { "campaignNumber":12, "discount":20, "subscriptionPeriod" "June" }}, { voucher: { "campaignNumber":13, "discount":24 }}] },
{ "name":"productResource2", orderItems: [ { voucher: { "campaignNumber":13, "discount":24 }}] },
{ "name":"productResource3", orderItems: [ { voucher: { "campaignNumber":15, "discount":28 }}] }
]
This is a priceService
object.
Let’s say you want to get an array of campaignNumber
which is 2 levels deep. You want to assign campaignNumber
into a different model name.
With a little help from HCursor
and a little help of Cats Traverse, you can retrieve campaignNumber
. Like this:
What do we learn here?
- If there is a nested array and you want to traverse the next nested array, extract that top-level to a
List[JSON]
and use catsTraverse
method. - All items wrap inside the for comprehension will return
Either[DecodeFailure,A]
. Since the items are wrapped in a List, it will returnList[Either[DecodeFailure,A]]
. - Cats
Traverse
helps flipList[Either]
toEither[List]
. - All JSON objects can be traverse with
HCursor
, so use HCursor to traverse the nested JSON string to get the attributescampaignNumber
.
Handling Class with Default on Non-Optional Field
If you create a domain model with a default argument, there be no problem if the caller doesn’t provide that argument. However, trying to decode that value in Circe throws decoder failure - Circe is not able to decode the non-optional field if you don’t put optional on that field.
But what if you don’t want to wrap the argument in an Option
because it is not optional?
The answer is to use prepare
.
prepare
helps you modify the JSON before Circe decodes it so that it doesn’t throw any error.
Let’s say in this example, not all the companies that provided data have a public
flag.
[
{"industry":"tech", "year":1990, "name":"Intel", "public": true},
{"industry":"tech", "year":2006, "name":"Netflix"},
{"industry":"Consumer Goods", "year":1860, "name":"Pepsoden", "public": true}
]
Therefore, we need to set the public
flag to false if it doesn’t exist.
And That’s it!
In Summary:
There are 3 ways to decode JSON with Circe – auto, semi-auto, and manual.
Circe can automatically decode standard type containers such as List or Option.
You don’t need to decode all the values in the JSON file, you can use deriveDecode,
and it automatically decodes all levels of JSON string for you.
If you want to deserialize multiple nested objects of arrays, you can use Cats Traverse
to get nested attributes.
When you want to preprocess JSON string before letting Circe decodes it for you, use prepare
.
I hope this post can help you solved any confusion about Circe, and any feedback is welcomed. If there are any other gotchas that you find in parsing JSON with Circe, please comment it below.
The full source code of this tutorial is here.