Shayne Fletcher: Prolog

Saturday, April 2, 2016

Rotate

This post is inspired by one of those classic "99 problems in Prolog".What we are looking for here are two functions that satisfy these signatures.

val rotate_left : int -> α list -> α list
val rotate_right : int -> α list -> α list

rotate_left n rotates a list $n$ places to the left, rotate_right n rotates a list $n$ places to the right. Examples:

# rotate_left 3 ['a';'b';'c';'d';'e';'f';'g';'h'] ;;
- : char list = ['d'; 'e'; 'f'; 'g'; 'h'; 'a'; 'b'; 'c']

# rotate_left (-2) ['a';'b';'c';'d';'e';'f';'g';'h'] ;;
- : char list = ['g'; 'h'; 'a'; 'b'; 'c'; 'd'; 'e'; 'f']

Of course, rotate_left and rotate_right are inverse functions of each other so we expect, for any int $x$ and list $l$, rotate_right x @@ rotate_left x l $=$ rotate_left x @@ rotate_right x l $=$ l.

Well, there are a variety of solutions to this problem with differing degrees of verbosity, complexity and efficiency. My own attempt at a solution resulted in this.

let rec drop (k : int) (l : α list) : α list =
  match k, l with
  | i, _ when i <= 0 -> l
  | _, [] -> []
  | _, (_ :: xs) -> drop (k - 1) xs

let rec take (k : int) (l : α list) : α list =
  match k, l with
  | i, _ when i <= 0 -> []
  | _, [] -> []
  | _, (x :: xs)  -> x :: take (k - 1) xs

let split_at (n : int) (l : α list) : α list * α list = 
  (take n l), (drop n l)

let rec rotate_left (n : int) (l : α list) : α list =
  match n with
  | _ when n = 0 -> l
  | _ when n < 0 ->  rotate_right (-n) l
  | _ -> 
    let m : int = List.length l in
    let k : int = n mod m in
    let (l : α list), (r : α list) = split_at k l in 
    r @ l

and rotate_right (n : int) (l : α list) : α list =
  match n with
  | _ when n = 0 -> l
  | _ when n < 0 ->  rotate_left (-n) l
  | _ -> 
    let m : int = List.length l in
    let k : int = m - n mod m in
    let (l : α list), (r : α list) = split_at k l in 
    r @ l

So far so good, but then I was shown the following solution in Haskell.

rotateLeft n xs 
  | n >= 0     = take (length xs) $ drop n $ concat $ repeat xs
  | otherwise  = rotateLeft (length xs + n) xs

rotateRight n = rotateLeft (-n)

I found that pretty nifty! See, in the function rotateLeft, repeat xs creates an infinite list of lists, (each a copy of xs), "joins" that infinite list of lists into one infinite list, then the first $n$ elements are dropped from that the list and we take the next length xs which gets us the original list rotated left $n$ places.

I felt compelled to attempt to emulate the program above in OCaml.

The phrasing "works" in Haskell due to the feature of lazy evaluation. OCaml on the other hand is eagerly evaluated. Lazy evaluation is possible in OCaml however, you just need to be explicit about it. Here's a type for "lazy lists" aka "streams".

type α stream =  Nil | Cons of α * α stream Lazy.t

A value of type α Lazy.t is a deferred computation, called a suspension that has the result type α. The syntax lazy$(expr)$ makes a suspension of $expr$, without yet evaluating $expr$. "Forcing" the suspension (using Lazy.force) evaluates $expr$ and returns its result.

Next up, functions to get the head and tail of a stream.

let hd = function | Nil -> failwith "hd" | Cons (h, _) -> h
let tl = function | Nil -> failwith "tl" | Cons (_, t) -> Lazy.force t

Also useful, a function to lift an α list to an α stream.

let from_list (l : α list) : α stream =
  List.fold_right (fun x s -> Cons (x, lazy s)) l Nil

Those are the basic building blocks. Now we turn attention to implementing repeat x to create infinite lists of the repeated value $x$.

let rec repeat (x : α) : α stream = Cons (x, lazy (repeat x))

Now to implement concat (I prefer to call this function by its alternative name flatten).

The characteristic operation of flatten is the joining together of two lists. For eager lists, we can write a function join that appends two lists like this.

let rec join l m =
  match l with
  | [] -> m
  | h :: t -> h :: (join t m)

This generalizes naturally to streams.

let rec join (l : α stream) (m : α stream) =
  match l with
  | Nil -> m
  | Cons (h, t) -> Cons (h, lazy (join (Lazy.force t) m))

For eager lists, we can write flatten in terms of join.

let rec flatten : α list list -> α list = function
   | [] -> []
   | (h :: tl) -> join h (flatten tl)

Emboldened by our earlier success we might try to generalize it to streams like this.

let rec flatten (l : α stream stream) : α stream =
   match l with
   | Nil -> lazy Nil
   | Cons (l, r) ->  join l (flatten (Lazy.force r))

Sadly, no. This definition is going to result in stack overflow. There is an alternative phrasing of flatten we might try.

let rec flatten = function
  | [] -> []
  | [] :: t -> flatten t
  | (x :: xs) :: t -> x :: (flatten (xs :: t))

Happy to say, this one generalizes and gets around the eager evaluation problem that causes the unbounded recursion.

let rec flatten : α stream stream -> α stream = function
  | Nil -> Nil
  | Cons (Nil, t) -> flatten (Lazy.force t)
  | Cons (Cons (x, xs), t) ->
      Cons (x, lazy (flatten (Cons (Lazy.force xs, t))))

take and drop are straight forward generalizations of their eager counterparts.

let rec drop (n : int) (lst : α stream ) : α stream = 
  match (n, lst) with
  | (n, _) when n < 0 -> invalid_arg "negative index in drop"
  | (n, xs) when n = 0 -> xs
  | (_, Nil) -> Nil
  | (n, Cons (_, t)) -> drop (n - 1) (Lazy.force t)

let rec take (n : int) (lst : α stream) : α list = 
  match (n, lst) with
  | (n, _) when n < 0 -> invalid_arg "negative index in take"
  | (n, _) when n = 0 -> []
  | (_, Nil) -> []
  | (n, Cons (h, t)) -> h :: (take (n - 1) (Lazy.force t))

Which brings us to the lazy version of rotate expressed in about the same number of lines of code!

let rec rotate_left (k : int) (l : α list) : α list =
  let n = List.length l in
  if k >= 0 then
    l |> from_list |> repeat |> flatten |> drop k |> take n
  else rotate_left (n + k) l

let rotate_right (n : int) : α list -> α list = rotate_left (-n)

Saturday, May 30, 2015

Run length encoding data compression method

Functional programming and lists go together like Fred and Ginger. This little exercise is one of Werner Hett's "Ninety-Nine Prolog Problems". The idea is to implement the run length encoding data compression method.

Here's how we start. First we write a function that packs consecutive duplicates of a list into sublists e.g.

# B.pack ['a'; 'a'; 'a'; 'b'; 'c'; 'c'; 'd'] ;;
- : char list list = [['a'; 'a'; 'a']; ['b']; ['c'; 'c']; ['d'];]

Then, consecutive duplicates of elements are encoded as terms $(N, E)$ where $N$ is the number of duplicates of the element $E$ e.g.

# B.rle (B.pack ['a'; 'a'; 'a'; 'b'; 'c'; 'c'; 'd'; 'e'; 'e']) ;;
- : (int * char) list = [(3, 'a'); (1, 'b'); (2, 'c'); (1, 'd'); (2, 'e')]

We will of course require a function to decode compressed data e.g.

 # B.unrle(B.rle(
     B.pack ['a'; 'a'; 'a'; 'b'; 'c'; 'c'; 'd'; 'e'; 'e'])
    ) ;;
- : char list = ['a'; 'a'; 'a'; 'b'; 'c'; 'c'; 'd'; 'e'; 'e']

(Credit goes to Harvey Stein for the names rle and unrle by the way).

So that's it for the first iteration - here's some code that aims to implement these specifications.

module B = struct

let pack (x : α list) : α list list =
  let f (acc : α list list) (c : α) : α list list =
    match acc with
    | (((b :: _) as hd) :: tl) when c = b -> (c :: hd) :: tl
    |  _ -> [c] :: acc
  in List.fold_left f [] x

let rle (x : α list list) : (int * α) list =
  let f (acc : (int * α) list) (l : α list) : (int * α) list =
    (List.length l, List.hd l) :: acc
  in List.fold_left f [] x
  
let unrle (data : (int * α) list) =
  let repeat ((n : int), (c : α)) : α list =
    let rec aux acc i = if i = 0 then acc else aux (c :: acc) (i - 1) in
    aux [] n in
  let f (acc : α list) (elem : (int * α)) : α list =
    acc @ (repeat elem) in
  List.fold_left f [] data
  
end

Now, pack is just a device of course. We don't really need it so here's the next iteration that does away with it.

module E = struct

let rle (x : α list) : (int * α) list =
  let f (acc : (int * α) list) (c : α) : (int * α) list =
    match acc with
    | ((n, e) :: tl) when e = c -> (n + 1, c):: tl
    | _-> (1, c) :: acc
  in List.rev (List.fold_left f [] x)

let unrle (data : (int * α) list) =
  let repeat ((n : int), (c : α)) : α list =
    let rec aux acc i = if i = 0 then acc else aux (c :: acc) (i - 1) in
    aux [] n in
  let f (acc : α list) (elem : (int * α)) : α list =
    acc @ (repeat elem) in
  List.fold_left f [] data

end

Nifty!

Ok, the next idea is that when a singleton byte is encountered, we don't write the term $(1, E)$ instead, we just write $E$. Now OCaml doesn't admit heterogenous lists like Prolog appears to do so we need a sum type for the two possibilities. This then is the final version.

module F = struct

  type α t = | S of α | C of (int * α)

  let rle (bytes : α list) : α t list =
    let f (acc : α t list) (b : α) : α t list =
      match acc with
      | ((S e) :: tl) when e = b -> (C (2, e)) :: tl
      | ((C (n, e)) :: tl) when e = b -> (C (n + 1, b)) :: tl
      | _-> S b :: acc
    in List.rev (List.fold_left f [] bytes)

  let unrle (data : (α t) list) =
    let rec aux (acc : α list) (b : α) : (int -> α list) = 
      function | 0 -> acc | i -> aux (b :: acc) b (i - 1) in
    let f (acc : α list) (e : α t) : α list =
      acc @ (match e with | S b -> [b]| C (n, b) -> aux [] b n) in
    List.fold_left f [] data

end

Having worked out the details in OCaml, translation into C++ is reasonably straight-forward. One economy granted by this language is that we can do away with the data constructor S in this version.

#include <boost/variant.hpp>
#include <boost/variant/apply_visitor.hpp>
#include <boost/range.hpp>
#include <boost/range/numeric.hpp>

#include <list>

//Representation of the encoding
template <class A> struct C { std::pair <int, A> item; };
template <class A> using datum = boost::variant <A, C<A>>;
template <class A> using encoding = std::list<datum<A>>;

//Procedural function object that updates an encoding given a
//datum
template <class A>
struct update : boost::static_visitor<> {
  A c;
  encoding<A>& l;

  update (A c, encoding<A>& l) : c (c), l (l) 
  {}

  void operator ()(A e) const { 
    if (e == c) {
      l.back () = C<A>{ std::make_pair(2, c) }; 
      return;
    }
    l.push_back (c);
  }

  void operator ()(C<A> const& elem) const { 
    if (elem.item.second == c) {
      l.back () = C<A>{ std::make_pair (elem.item.first + 1, c) };
      return;
    }
    l.push_back (c);
  }
};

template <class R>
encoding<typename boost::range_value<R>::type> rle (R bytes) {
  typedef boost::range_value<R>::type A;

  auto f = [](encoding<A> acc, A b) -> encoding<A> {
    if (acc.size () == 0)
      acc.push_back (b);
    else   {
      boost::apply_visitor (update<A>(b, acc), acc.back ());
    }
    return acc;
  };

  return boost::accumulate (bytes, encoding<A> (), f);
}

I've left implementing unrle () as an exercise. Here's a little test though that confirms that we are getting savings in from the compression scheme as we hope for.

int main () {
  std::string buf=
    "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
    "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"
    "c"
    "ddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd"
    "ddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd"
    "ddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd"
    "ddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd"
    "ddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd"
    "ddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd"
    "eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee"
    "z";
  std::list<char> data(buf.begin (), buf.end());
  encoding<char> compressed = rle (data);

  std::cout << sizeof (char) * (data.size ()) << std::endl;
  std::cout << sizeof (datum <char>) * (compressed.size ()) << std::endl;

  return 0;
}

On my machine, this program prints the values $484$ and $72$.