Basic Parsers

In this chapter we will learn the basics of using primitive parsers, and combining them to build larger parsers.

But first we need to import some stuff. Fine-grained imports are supported but it’s usually fine to just import everything. We also need cats implicits for applicative syntax below.

import atto._, Atto._
import cats.implicits._

Rock on, let’s parse an integer!

scala> int parseOnly "123abc"
res0: atto.ParseResult[Int] = Done(abc,123)

This result means we successfully parsed an Int and have the text "abc" left over. We’ll talk more about this momentarily. But let’s back up. What’s this int thing?

scala> int
res1: atto.Parser[Int] = int

A Parser[A] is a computation that consumes characters and produces a value of type A. In this case Int. Let’s look at another predefined parser that matches only characters where isLetter is true.

scala> letter
res2: atto.Parser[Char] = letter

We can ask a parser to parse a string, and we get back a ParseResult[A]. The Done constructor shows the remaining input (if any) and the answer.

scala> letter.parse("x")
res3: atto.ParseResult[Char] = Done(,x)

scala> letter.parse("xyz")
res4: atto.ParseResult[Char] = Done(yz,x)

The Failure constructor shows us the remaining input, the parsing stack (ignore this for now), and a description of the failiure.

scala> letter.parse("1")
res5: atto.ParseResult[Char] = Fail(1,List(),Failure reading:letter)

The Partial constructor indicates that the parser has neither succeeded nor failed; more input is required before we will know. We can feed more data to continue parsing. Our parsers thus support incremental parsing which allows us to parse directly from a stream, for example.

scala> letter.parse("")
res6: atto.ParseResult[Char] = Partial(atto.Parser$Internal$Partial$$Lambda$10429/1489281972@3494b2fe)

scala> letter.parse("").feed("abc")
res7: atto.ParseResult[Char] = Done(bc,a)

The many combinator turns a Parser[A] into a Parser[List[A]].

scala> many(letter).parse("abc")
res8: atto.ParseResult[List[Char]] = Partial(atto.Parser$Internal$Partial$$Lambda$10429/1489281972@30a92438)

scala> many(letter).parse("abc").feed("def")
res9: atto.ParseResult[List[Char]] = Partial(atto.Parser$Internal$Partial$$Lambda$10429/1489281972@60ca2e90)

There may be more letters coming, so we can say we’re done to indicate that there is no more input.

scala> many(letter).parse("abc").feed("def").done
res10: atto.ParseResult[List[Char]] = Done(,List(a, b, c, d, e, f))

Parser is a functor, so you can map the result and turn it int something else.

scala> many(letter).map(_.mkString).parse("abc").feed("def").done
res11: atto.ParseResult[String] = Done(,abcdef)

The ~ combinator turns Parser[A], Parser[B] into Parser[(A,B)]

scala> letter ~ digit
res12: atto.Parser[(Char, Char)] = (letter) ~ digit

scala> (letter ~ digit).parse("a1")
res13: atto.ParseResult[(Char, Char)] = Done(,(a,1))

scala> (many(letter) ~ many(digit)).parse("aaa")
res14: atto.ParseResult[(List[Char], List[Char])] = Partial(atto.Parser$Internal$Partial$$Lambda$10429/1489281972@32cca68e)

scala> (many(letter) ~ many(digit)).parse("aaa").feed("bcd123").done
res15: atto.ParseResult[(List[Char], List[Char])] = Done(,(List(a, a, a, b, c, d),List(1, 2, 3)))

scala> (many(letter) ~ many(digit)).map { case (a, b) => a ++ b } .parse("aaa").feed("bcd123").done
res16: atto.ParseResult[List[Char]] = Done(,List(a, a, a, b, c, d, 1, 2, 3))

Destructuring the pair in map is a pain, and it gets worse with nested pairs.

scala> (letter ~ int ~ digit ~ byte)
res17: atto.Parser[(((Char, Int), Char), Byte)] = (((letter) ~ int) ~ digit) ~ byte

But have no fear, Parser is an applicative functor.

scala> (many(letter), many(digit)).mapN(_ ++ _).parse("aaa").feed("bcd123").done
res18: atto.ParseResult[List[Char]] = Done(,List(a, a, a, b, c, d, 1, 2, 3))

In fact, it’s a monad. This allows the result of one parser to influence the behavior of subsequent parsers. Here we build a parser that parses an integer followed by an arbitrary string of that length.

scala> val p = for { n <- int; c <- take(n) } yield c
p: atto.Parser[String] = (int) flatMap ...

scala> p.parse("3abcdef")
res19: atto.ParseResult[String] = Done(def,abc)

scala> p.parse("4abcdef")
res20: atto.ParseResult[String] = Done(ef,abcd)

In the next chapter we will build up parsers for larger structures.