April 13, 2014

Atto 0.2 Tutorial

This is an intro tutorial for the atto parser combinator library, compiled by the tut tutorial generator.

Getting Started

Let’s import some stuff.

scala> import scalaz._, Scalaz._
import scalaz._
import Scalaz._

scala> import atto._, Atto._
import atto._
import Atto._

Rock on, let’s parse an integer!

scala> int parseOnly "123abc"
res0: atto.ParseResult[Int] = Done(abc,123)

Very Simple Examples

A Parser[A] consumes characters and produces a value of type A. Let’s look at a predefined parser that matches only characters where isLetter is true.

scala> letter
res1: atto.Parser[Char] = letter

We can ask a parser to parse a string, and we get back a ParseResult[A]. The Done constructor shows the remaining input (if any) and the answer.

scala> letter.parse("x")
res2: atto.ParseResult[Char] = Done(,x)

scala> letter.parse("xyz")
res3: atto.ParseResult[Char] = Done(yz,x)

The Failure constructor shows us the remaining input, the parsing stack (ignore this for now), and a description of the failiure.

scala> letter.parse("1")
res4: atto.ParseResult[Char] = Fail(1,List(),Failure reading:letter)

The Partial constructor indicates that the parser has neither succeeded nor failed; more input is required before we will know. We can feed more data to continue parsing. Our parsers thus support incremental parsing which allows us to parse directly from a stream, for example.

scala> letter.parse("")
res5: atto.ParseResult[Char] = Partial(<function1>)

scala> letter.parse("").feed("abc")
res6: atto.ParseResult[Char] = Done(bc,a)

The many combinator turns a Parser[A] into a Parser[List[A]].

scala> many(letter).parse("abc")
res7: atto.ParseResult[List[Char]] = Partial(<function1>)

scala> many(letter).parse("abc").feed("def")
res8: atto.ParseResult[List[Char]] = Partial(<function1>)

There may be more letters coming, so we can say we’re done to indicate that there is no more input.

scala> many(letter).parse("abc").feed("def").done
res9: atto.ParseResult[List[Char]] = Done(,List(a, b, c, d, e, f))

Parser is a functor.

scala> many(letter).map(_.mkString).parse("abc").feed("def").done
res10: atto.ParseResult[String] = Done(,abcdef)

The ~ combinator turns Parser[A], Parser[B] into Parser[(A,B)]

scala> letter ~ digit
res11: atto.Parser[(Char, Char)] = (letter) ~ digit

scala> (letter ~ digit).parse("a1")
res12: atto.ParseResult[(Char, Char)] = Done(,(a,1))

scala> (many(letter) ~ many(digit)).parse("aaa")
res13: atto.ParseResult[(List[Char], List[Char])] = Partial(<function1>)

scala> (many(letter) ~ many(digit)).parse("aaa").feed("bcd123").done
res14: atto.ParseResult[(List[Char], List[Char])] = Done(,(List(a, a, a, b, c, d),List(1, 2, 3)))

scala> (many(letter) ~ many(digit)).map(p => p._1 ++ p._2).parse("aaa").feed("bcd123").done
res15: atto.ParseResult[List[Char]] = Done(,List(a, a, a, b, c, d, 1, 2, 3))

Destructuring the pair in map is a pain, and it gets worse with nested pairs.

scala> (letter ~ int ~ digit ~ byte)
res16: atto.Parser[(((Char, Int), Char), Byte)] = (((letter) ~ int) ~ digit) ~ byte

But have no fear, Parser is an applicative functor.

scala> (many(letter) |@| many(digit))(_ ++ _).parse("aaa").feed("bcd123").done
res17: atto.ParseResult[List[Char]] = Done(,List(a, a, a, b, c, d, 1, 2, 3))

In fact, it’s a monad. This allows the result of one parser to influence the behavior of subsequent parsers. Here we build a parser that parses an integer followed by an arbitrary string of that length.

scala> val p = for { n <- int; c <- take(n) } yield c
p: atto.Parser[String] = (int) flatMap ...

scala> p.parse("3abcdef")
res18: atto.ParseResult[String] = Done(def,abc)

scala> p.parse("4abcdef")
res19: atto.ParseResult[String] = Done(ef,abcd)

A Larger Example

This is taken from a nice tutorial over at FP Complete.

First let’s define a data type for IP addresses.

scala> import spire.math.UByte // we need this for unisigned bytes
import spire.math.UByte

scala> case class IP(a: UByte, b: UByte, c: UByte, d: UByte) 
defined class IP

As a first pass we can parse an IP address in the form 128.42.30.1 by using the ubyte and char parsers directly, in a for comprehension.

scala> import atto.parser.spire._ // we need this for spire parsers
import atto.parser.spire._

scala> val ip: Parser[IP] =
     |   for {
     |     a <- ubyte
     |     _ <- char('.')
     |     b <- ubyte
     |     _ <- char('.')
     |     c <- ubyte
     |     _ <- char('.')
     |     d <- ubyte
     |   } yield IP(a, b, c, d)
ip: atto.Parser[IP] = (ubyte) flatMap ...

scala> ip parseOnly "foo.bar"
res20: atto.ParseResult[IP] = Fail(foo.bar,List(ubyte, int),Failure reading:bigInt)

scala> ip parseOnly "128.42.42.1"
res21: atto.ParseResult[IP] = Done(,IP(128,42,42,1))

scala> ip.parseOnly("128.42.42.1").option
res22: Option[IP] = Some(IP(128,42,42,1))

Let’s factor out the dot.

scala> val dot: Parser[Char] =  char('.')
dot: atto.Parser[Char] = '.'

The <~ and ~> combinators combine two parsers sequentially, discarding the value produced by the parser on the ~ side. We can use this to simplify our comprehension a bit.

scala> val ip1: Parser[IP] =
     |   for { 
     |     a <- ubyte <~ dot
     |     b <- ubyte <~ dot
     |     c <- ubyte <~ dot
     |     d <- ubyte
     |   } yield IP(a, b, c, d)
ip1: atto.Parser[IP] = ((ubyte) <~ '.') flatMap ...

scala> ip1.parseOnly("128.42.42.1").option
res23: Option[IP] = Some(IP(128,42,42,1))

We can name our parser, which provides slightly more enlightening failure messages

scala> val ip2 = ip1 named "ip-address"
ip2: atto.Parser[IP] = ip-address

scala> val ip3 = ip1 namedOpaque "ip-address" // difference is illustrated below
ip3: atto.Parser[IP] = ip-address

scala> ip2 parseOnly "foo.bar"
res24: atto.ParseResult[IP] = Fail(foo.bar,List(ip-address, ubyte, int),Failure reading:bigInt)

scala> ip3 parseOnly "foo.bar"
res25: atto.ParseResult[IP] = Fail(foo.bar,List(),Failure reading:ip-address)

Since nothing that occurs on the right-hand side of our <- appears on the left-hand side, we don’t actually need a monad; we can use applicative syntax here.

scala> val ubyteDot = ubyte <~ dot // why not?
ubyteDot: atto.Parser[spire.math.UByte] = (ubyte) <~ '.'

scala> val ip4 = (ubyteDot |@| ubyteDot |@| ubyteDot |@| ubyte)(IP.apply) named "ip-address"
ip4: atto.Parser[IP] = ip-address

scala> ip4.parseOnly("128.42.42.1").option
res26: Option[IP] = Some(IP(128,42,42,1))

We might prefer to get some information about failure, so either is an, um, option.

scala> ip4.parseOnly("abc.42.42.1").either
res27: scalaz.\/[String,IP] = -\/(Failure reading:bigInt)

scala> ip4.parseOnly("128.42.42.1").either
res28: scalaz.\/[String,IP] = \/-(IP(128,42,42,1))

Here’s an example log. Let’s write a parser for it.

scala> val logData = 
     |   """|2013-06-29 11:16:23 124.67.34.60 keyboard
     |      |2013-06-29 11:32:12 212.141.23.67 mouse
     |      |2013-06-29 11:33:08 212.141.23.67 monitor
     |      |2013-06-29 12:12:34 125.80.32.31 speakers
     |      |2013-06-29 12:51:50 101.40.50.62 keyboard
     |      |2013-06-29 13:10:45 103.29.60.13 mouse
     |      |""".stripMargin
logData: String = 
"2013-06-29 11:16:23 124.67.34.60 keyboard
2013-06-29 11:32:12 212.141.23.67 mouse
2013-06-29 11:33:08 212.141.23.67 monitor
2013-06-29 12:12:34 125.80.32.31 speakers
2013-06-29 12:51:50 101.40.50.62 keyboard
2013-06-29 13:10:45 103.29.60.13 mouse
"

And some data types for the parsed data.

scala> case class Date(year: Int, month: Int, day: Int)
defined class Date

scala> case class Time(hour: Int, minutes: Int, seconds: Int)
defined class Time

scala> case class DateTime(date: Date, time: Time)
defined class DateTime

scala> sealed trait Product // Products are an enumerated type
defined trait Product

scala> case object Mouse extends Product
defined module Mouse

scala> case object Keyboard extends Product
defined module Keyboard

scala> case object Monitor extends Product
defined module Monitor

scala> case object Speakers extends Product
defined module Speakers

scala> case class LogEntry(entryTime: DateTime, entryIP: IP, entryProduct: Product)
defined class LogEntry

scala> type Log = List[LogEntry]
defined type alias Log

There’s no built-in parser for fixed-width ints, so we can just make one. Probably shouldn’t be doing this in a tutorial though. How should we handle this?

scala> def fixed(n:Int): Parser[Int] =
     |   count(n, digit).map(_.mkString).flatMap { s => 
     |     try ok(s.toInt) catch { case e: NumberFormatException => err(e.toString) }
     |   }
fixed: (n: Int)atto.Parser[Int]

Now we have what we need to put the log parser together.

scala> val date: Parser[Date] =
     |   (fixed(4) <~ char('-') |@| fixed(2) <~ char('-') |@| fixed(2))(Date.apply)
date: atto.Parser[Date] = (((ok(<function2>)) flatMap ...) flatMap ...) flatMap ...

scala> val time: Parser[Time] =
     |   (fixed(2) <~ char(':') |@| fixed(2) <~ char(':') |@| fixed(2))(Time.apply)
time: atto.Parser[Time] = (((ok(<function2>)) flatMap ...) flatMap ...) flatMap ...

scala> val dateTime: Parser[DateTime] =
     |   (date <~ char(' ') |@| time)(DateTime.apply)
dateTime: atto.Parser[DateTime] = (((ok(<function2>)) flatMap ...) flatMap ...) flatMap ...

scala> val product: Parser[Product] = {
     |   string("keyboard").map(_ => Keyboard) |
     |   string("mouse")   .map(_ => Mouse)    |
     |   string("monitor") .map(_ => Monitor)  |
     |   string("speakers").map(_ => Speakers)
     | }
product: atto.Parser[Product] = ((((string("keyboard")) map ...) | ...) | ...) | ...

scala> val logEntry: Parser[LogEntry] =
     |   (dateTime <~ char(' ') |@| ip <~ char(' ') |@| product)(LogEntry.apply)
logEntry: atto.Parser[LogEntry] = (((ok(<function2>)) flatMap ...) flatMap ...) flatMap ...

scala> val log: Parser[Log] =
     |   sepBy(logEntry, char('\n'))
log: atto.Parser[Log] = 
sepBy((((ok(<function2>)) flatMap ...) flatMap ...) flatMap ...,'
')

scala> (log parseOnly logData).option.foldMap(_.mkString("\n"))
res29: String = 
LogEntry(DateTime(Date(2013,6,29),Time(11,16,23)),IP(124,67,34,60),Keyboard)
LogEntry(DateTime(Date(2013,6,29),Time(11,32,12)),IP(212,141,23,67),Mouse)
LogEntry(DateTime(Date(2013,6,29),Time(11,33,8)),IP(212,141,23,67),Monitor)
LogEntry(DateTime(Date(2013,6,29),Time(12,12,34)),IP(125,80,32,31),Speakers)
LogEntry(DateTime(Date(2013,6,29),Time(12,51,50)),IP(101,40,50,62),Keyboard)
LogEntry(DateTime(Date(2013,6,29),Time(13,10,45)),IP(103,29,60,13),Mouse)