April 13, 2014
Atto 0.2 Tutorial
This is an intro tutorial for the atto parser combinator library, compiled by the tut tutorial generator.
Getting Started
Let’s import some stuff.
scala> import scalaz._, Scalaz._
import scalaz._
import Scalaz._
scala> import atto._, Atto._
import atto._
import Atto._
Rock on, let’s parse an integer!
scala> int parseOnly "123abc"
res0: atto.ParseResult[Int] = Done(abc,123)
Very Simple Examples
A Parser[A]
consumes characters and produces a value of type A
. Let’s look at a predefined parser that matches only
characters where isLetter
is true.
scala> letter
res1: atto.Parser[Char] = letter
We can ask a parser to parse a string, and we get back a ParseResult[A]
. The Done
constructor shows the remaining input (if any) and the answer.
scala> letter.parse("x")
res2: atto.ParseResult[Char] = Done(,x)
scala> letter.parse("xyz")
res3: atto.ParseResult[Char] = Done(yz,x)
The Failure
constructor shows us the remaining input, the parsing stack (ignore this for now), and a description
of the failiure.
scala> letter.parse("1")
res4: atto.ParseResult[Char] = Fail(1,List(),Failure reading:letter)
The Partial
constructor indicates that the parser has neither succeeded nor failed; more input is required before we will know. We can feed
more data to continue parsing. Our parsers thus support incremental parsing
which allows us to parse directly from a stream, for example.
scala> letter.parse("")
res5: atto.ParseResult[Char] = Partial(<function1>)
scala> letter.parse("").feed("abc")
res6: atto.ParseResult[Char] = Done(bc,a)
The many
combinator turns a Parser[A]
into a Parser[List[A]]
.
scala> many(letter).parse("abc")
res7: atto.ParseResult[List[Char]] = Partial(<function1>)
scala> many(letter).parse("abc").feed("def")
res8: atto.ParseResult[List[Char]] = Partial(<function1>)
There may be more letters coming, so we can say we’re done
to indicate that there is no more input.
scala> many(letter).parse("abc").feed("def").done
res9: atto.ParseResult[List[Char]] = Done(,List(a, b, c, d, e, f))
Parser
is a functor.
scala> many(letter).map(_.mkString).parse("abc").feed("def").done
res10: atto.ParseResult[String] = Done(,abcdef)
The ~
combinator turns Parser[A], Parser[B]
into Parser[(A,B)]
scala> letter ~ digit
res11: atto.Parser[(Char, Char)] = (letter) ~ digit
scala> (letter ~ digit).parse("a1")
res12: atto.ParseResult[(Char, Char)] = Done(,(a,1))
scala> (many(letter) ~ many(digit)).parse("aaa")
res13: atto.ParseResult[(List[Char], List[Char])] = Partial(<function1>)
scala> (many(letter) ~ many(digit)).parse("aaa").feed("bcd123").done
res14: atto.ParseResult[(List[Char], List[Char])] = Done(,(List(a, a, a, b, c, d),List(1, 2, 3)))
scala> (many(letter) ~ many(digit)).map(p => p._1 ++ p._2).parse("aaa").feed("bcd123").done
res15: atto.ParseResult[List[Char]] = Done(,List(a, a, a, b, c, d, 1, 2, 3))
Destructuring the pair in map
is a pain, and it gets worse with nested pairs.
scala> (letter ~ int ~ digit ~ byte)
res16: atto.Parser[(((Char, Int), Char), Byte)] = (((letter) ~ int) ~ digit) ~ byte
But have no fear, Parser
is an applicative functor.
scala> (many(letter) |@| many(digit))(_ ++ _).parse("aaa").feed("bcd123").done
res17: atto.ParseResult[List[Char]] = Done(,List(a, a, a, b, c, d, 1, 2, 3))
In fact, it’s a monad. This allows the result of one parser to influence the behavior of subsequent parsers. Here we build a parser that parses an integer followed by an arbitrary string of that length.
scala> val p = for { n <- int; c <- take(n) } yield c
p: atto.Parser[String] = (int) flatMap ...
scala> p.parse("3abcdef")
res18: atto.ParseResult[String] = Done(def,abc)
scala> p.parse("4abcdef")
res19: atto.ParseResult[String] = Done(ef,abcd)
A Larger Example
This is taken from a nice tutorial over at FP Complete.
First let’s define a data type for IP addresses.
scala> import spire.math.UByte // we need this for unisigned bytes
import spire.math.UByte
scala> case class IP(a: UByte, b: UByte, c: UByte, d: UByte)
defined class IP
As a first pass we can parse an IP address in the form 128.42.30.1 by using the ubyte
and
char
parsers directly, in a for
comprehension.
scala> import atto.parser.spire._ // we need this for spire parsers
import atto.parser.spire._
scala> val ip: Parser[IP] =
| for {
| a <- ubyte
| _ <- char('.')
| b <- ubyte
| _ <- char('.')
| c <- ubyte
| _ <- char('.')
| d <- ubyte
| } yield IP(a, b, c, d)
ip: atto.Parser[IP] = (ubyte) flatMap ...
scala> ip parseOnly "foo.bar"
res20: atto.ParseResult[IP] = Fail(foo.bar,List(ubyte, int),Failure reading:bigInt)
scala> ip parseOnly "128.42.42.1"
res21: atto.ParseResult[IP] = Done(,IP(128,42,42,1))
scala> ip.parseOnly("128.42.42.1").option
res22: Option[IP] = Some(IP(128,42,42,1))
Let’s factor out the dot.
scala> val dot: Parser[Char] = char('.')
dot: atto.Parser[Char] = '.'
The <~
and ~>
combinators combine two parsers sequentially, discarding the value produced by
the parser on the ~
side. We can use this to simplify our comprehension a bit.
scala> val ip1: Parser[IP] =
| for {
| a <- ubyte <~ dot
| b <- ubyte <~ dot
| c <- ubyte <~ dot
| d <- ubyte
| } yield IP(a, b, c, d)
ip1: atto.Parser[IP] = ((ubyte) <~ '.') flatMap ...
scala> ip1.parseOnly("128.42.42.1").option
res23: Option[IP] = Some(IP(128,42,42,1))
We can name our parser, which provides slightly more enlightening failure messages
scala> val ip2 = ip1 named "ip-address"
ip2: atto.Parser[IP] = ip-address
scala> val ip3 = ip1 namedOpaque "ip-address" // difference is illustrated below
ip3: atto.Parser[IP] = ip-address
scala> ip2 parseOnly "foo.bar"
res24: atto.ParseResult[IP] = Fail(foo.bar,List(ip-address, ubyte, int),Failure reading:bigInt)
scala> ip3 parseOnly "foo.bar"
res25: atto.ParseResult[IP] = Fail(foo.bar,List(),Failure reading:ip-address)
Since nothing that occurs on the right-hand side of our <- appears on the left-hand side, we don’t actually need a monad; we can use applicative syntax here.
scala> val ubyteDot = ubyte <~ dot // why not?
ubyteDot: atto.Parser[spire.math.UByte] = (ubyte) <~ '.'
scala> val ip4 = (ubyteDot |@| ubyteDot |@| ubyteDot |@| ubyte)(IP.apply) named "ip-address"
ip4: atto.Parser[IP] = ip-address
scala> ip4.parseOnly("128.42.42.1").option
res26: Option[IP] = Some(IP(128,42,42,1))
We might prefer to get some information about failure, so either
is an, um, option.
scala> ip4.parseOnly("abc.42.42.1").either
res27: scalaz.\/[String,IP] = -\/(Failure reading:bigInt)
scala> ip4.parseOnly("128.42.42.1").either
res28: scalaz.\/[String,IP] = \/-(IP(128,42,42,1))
Here’s an example log. Let’s write a parser for it.
scala> val logData =
| """|2013-06-29 11:16:23 124.67.34.60 keyboard
| |2013-06-29 11:32:12 212.141.23.67 mouse
| |2013-06-29 11:33:08 212.141.23.67 monitor
| |2013-06-29 12:12:34 125.80.32.31 speakers
| |2013-06-29 12:51:50 101.40.50.62 keyboard
| |2013-06-29 13:10:45 103.29.60.13 mouse
| |""".stripMargin
logData: String =
"2013-06-29 11:16:23 124.67.34.60 keyboard
2013-06-29 11:32:12 212.141.23.67 mouse
2013-06-29 11:33:08 212.141.23.67 monitor
2013-06-29 12:12:34 125.80.32.31 speakers
2013-06-29 12:51:50 101.40.50.62 keyboard
2013-06-29 13:10:45 103.29.60.13 mouse
"
And some data types for the parsed data.
scala> case class Date(year: Int, month: Int, day: Int)
defined class Date
scala> case class Time(hour: Int, minutes: Int, seconds: Int)
defined class Time
scala> case class DateTime(date: Date, time: Time)
defined class DateTime
scala> sealed trait Product // Products are an enumerated type
defined trait Product
scala> case object Mouse extends Product
defined module Mouse
scala> case object Keyboard extends Product
defined module Keyboard
scala> case object Monitor extends Product
defined module Monitor
scala> case object Speakers extends Product
defined module Speakers
scala> case class LogEntry(entryTime: DateTime, entryIP: IP, entryProduct: Product)
defined class LogEntry
scala> type Log = List[LogEntry]
defined type alias Log
There’s no built-in parser for fixed-width ints, so we can just make one. Probably shouldn’t be doing this in a tutorial though. How should we handle this?
scala> def fixed(n:Int): Parser[Int] =
| count(n, digit).map(_.mkString).flatMap { s =>
| try ok(s.toInt) catch { case e: NumberFormatException => err(e.toString) }
| }
fixed: (n: Int)atto.Parser[Int]
Now we have what we need to put the log parser together.
scala> val date: Parser[Date] =
| (fixed(4) <~ char('-') |@| fixed(2) <~ char('-') |@| fixed(2))(Date.apply)
date: atto.Parser[Date] = (((ok(<function2>)) flatMap ...) flatMap ...) flatMap ...
scala> val time: Parser[Time] =
| (fixed(2) <~ char(':') |@| fixed(2) <~ char(':') |@| fixed(2))(Time.apply)
time: atto.Parser[Time] = (((ok(<function2>)) flatMap ...) flatMap ...) flatMap ...
scala> val dateTime: Parser[DateTime] =
| (date <~ char(' ') |@| time)(DateTime.apply)
dateTime: atto.Parser[DateTime] = (((ok(<function2>)) flatMap ...) flatMap ...) flatMap ...
scala> val product: Parser[Product] = {
| string("keyboard").map(_ => Keyboard) |
| string("mouse") .map(_ => Mouse) |
| string("monitor") .map(_ => Monitor) |
| string("speakers").map(_ => Speakers)
| }
product: atto.Parser[Product] = ((((string("keyboard")) map ...) | ...) | ...) | ...
scala> val logEntry: Parser[LogEntry] =
| (dateTime <~ char(' ') |@| ip <~ char(' ') |@| product)(LogEntry.apply)
logEntry: atto.Parser[LogEntry] = (((ok(<function2>)) flatMap ...) flatMap ...) flatMap ...
scala> val log: Parser[Log] =
| sepBy(logEntry, char('\n'))
log: atto.Parser[Log] =
sepBy((((ok(<function2>)) flatMap ...) flatMap ...) flatMap ...,'
')
scala> (log parseOnly logData).option.foldMap(_.mkString("\n"))
res29: String =
LogEntry(DateTime(Date(2013,6,29),Time(11,16,23)),IP(124,67,34,60),Keyboard)
LogEntry(DateTime(Date(2013,6,29),Time(11,32,12)),IP(212,141,23,67),Mouse)
LogEntry(DateTime(Date(2013,6,29),Time(11,33,8)),IP(212,141,23,67),Monitor)
LogEntry(DateTime(Date(2013,6,29),Time(12,12,34)),IP(125,80,32,31),Speakers)
LogEntry(DateTime(Date(2013,6,29),Time(12,51,50)),IP(101,40,50,62),Keyboard)
LogEntry(DateTime(Date(2013,6,29),Time(13,10,45)),IP(103,29,60,13),Mouse)