Basic Parsers
In this chapter we will learn the basics of using primitive parsers, and combining them to build larger parsers.
But first we need to import some stuff. Fine-grained imports are supported but it’s usually fine to just import everything. We also need cats implicits for applicative syntax below.
import atto._, Atto._
import cats.implicits._
Rock on, let’s parse an integer!
scala> int parseOnly "123abc"
res0: atto.ParseResult[Int] = Done(abc,123)
This result means we successfully parsed an Int
and have the text "abc"
left over. We’ll talk more about this momentarily. But let’s back up. What’s this int
thing?
scala> int
res1: atto.Parser[Int] = int
A Parser[A]
is a computation that consumes characters and produces a value of type A
. In this case Int
. Let’s look at another predefined parser that matches only characters where isLetter
is true.
scala> letter
res2: atto.Parser[Char] = letter
We can ask a parser to parse a string, and we get back a ParseResult[A]
. The Done
constructor shows the remaining input (if any) and the answer.
scala> letter.parse("x")
res3: atto.ParseResult[Char] = Done(,x)
scala> letter.parse("xyz")
res4: atto.ParseResult[Char] = Done(yz,x)
The Failure
constructor shows us the remaining input, the parsing stack (ignore this for now), and a description of the failiure.
scala> letter.parse("1")
res5: atto.ParseResult[Char] = Fail(1,List(),Failure reading:letter)
The Partial
constructor indicates that the parser has neither succeeded nor failed; more input is required before we will know. We can feed
more data to continue parsing. Our parsers thus support incremental parsing which allows us to parse directly from a stream, for example.
scala> letter.parse("")
res6: atto.ParseResult[Char] = Partial(atto.Parser$Internal$Partial$$Lambda$10429/1489281972@3494b2fe)
scala> letter.parse("").feed("abc")
res7: atto.ParseResult[Char] = Done(bc,a)
The many
combinator turns a Parser[A]
into a Parser[List[A]]
.
scala> many(letter).parse("abc")
res8: atto.ParseResult[List[Char]] = Partial(atto.Parser$Internal$Partial$$Lambda$10429/1489281972@30a92438)
scala> many(letter).parse("abc").feed("def")
res9: atto.ParseResult[List[Char]] = Partial(atto.Parser$Internal$Partial$$Lambda$10429/1489281972@60ca2e90)
There may be more letters coming, so we can say we’re done
to indicate that there is no more input.
scala> many(letter).parse("abc").feed("def").done
res10: atto.ParseResult[List[Char]] = Done(,List(a, b, c, d, e, f))
Parser
is a functor, so you can map
the result and turn it int something else.
scala> many(letter).map(_.mkString).parse("abc").feed("def").done
res11: atto.ParseResult[String] = Done(,abcdef)
The ~
combinator turns Parser[A], Parser[B]
into Parser[(A,B)]
scala> letter ~ digit
res12: atto.Parser[(Char, Char)] = (letter) ~ digit
scala> (letter ~ digit).parse("a1")
res13: atto.ParseResult[(Char, Char)] = Done(,(a,1))
scala> (many(letter) ~ many(digit)).parse("aaa")
res14: atto.ParseResult[(List[Char], List[Char])] = Partial(atto.Parser$Internal$Partial$$Lambda$10429/1489281972@32cca68e)
scala> (many(letter) ~ many(digit)).parse("aaa").feed("bcd123").done
res15: atto.ParseResult[(List[Char], List[Char])] = Done(,(List(a, a, a, b, c, d),List(1, 2, 3)))
scala> (many(letter) ~ many(digit)).map { case (a, b) => a ++ b } .parse("aaa").feed("bcd123").done
res16: atto.ParseResult[List[Char]] = Done(,List(a, a, a, b, c, d, 1, 2, 3))
Destructuring the pair in map
is a pain, and it gets worse with nested pairs.
scala> (letter ~ int ~ digit ~ byte)
res17: atto.Parser[(((Char, Int), Char), Byte)] = (((letter) ~ int) ~ digit) ~ byte
But have no fear, Parser
is an applicative functor.
scala> (many(letter), many(digit)).mapN(_ ++ _).parse("aaa").feed("bcd123").done
res18: atto.ParseResult[List[Char]] = Done(,List(a, a, a, b, c, d, 1, 2, 3))
In fact, it’s a monad. This allows the result of one parser to influence the behavior of subsequent parsers. Here we build a parser that parses an integer followed by an arbitrary string of that length.
scala> val p = for { n <- int; c <- take(n) } yield c
p: atto.Parser[String] = (int) flatMap ...
scala> p.parse("3abcdef")
res19: atto.ParseResult[String] = Done(def,abc)
scala> p.parse("4abcdef")
res20: atto.ParseResult[String] = Done(ef,abcd)
In the next chapter we will build up parsers for larger structures.