Dhall, gradual typing and AWS CloudFormation

We've been having a problem at work: the CloudFormation template deploying the necessary infrastructure grew into a giant sheet with many repetitive parts. To add a new resource (SNS topic) we have to edit three lists each time: the topic list, their permissions and triggers which fire the changes.

Ideally we'd want to list the topics in one place and generate the full template automatically. It's easy to write such a generator but it'd be nice to use a ready-built solution which we wouldn't have to onboard the current and new team members to specifically.

There are several CloudFormation generators, for example, in Scala and Python: CloudFormation Template Generator, troposphere. We've tried CloudFormation Template Generator, but it doesn't support cyclical references between resources, so we've stopped working on the problem for a while.

This time we've decided to try Dhall instead of a specialised generator. It is similar to JSON in syntax, can call functions, is strongly typed and can generate YAML and JSON (there are many other benefits not directly related to CF that I'm omitting here).

The advantage of Dhall over concrete generators is that despite fully static type checking, Dhall supports progressive typing: you don't have to explicitly mark the type of each value, it can be inferred automatically. This allows to skip defining all the possible types of resources and such beforehand. On the other hand, if a generator doesn't support a particular AWS resource type, the generator itself must be modified.

We've taken the existing CloudFormation template as the base and started to implement it in Dhall straight on. The first level only required cosmetic changes:

let mkParam = \(desc: Text) -> { `Type` = "String", Description = desc } in
{ AWSTemplateFormatVersion = "2010-09-09"
, Description = "ACME bucket"
, Parameters = { BucketName = mkParam "Name of the ACME bucket"
               , LogsBucketName = mkParam "Name of the ACME logs bucket"
               , DeploymentsBucketName = mkParam "Name of the ACME deployments bucket"
               }
}

The mkParam function already saved a few lines, but the most interesting is ahead - the Resources object.

The resource list in the existing template looks like this:

Resources:
  ACMEBucket:
    Type: AWS::S3::Bucket
    # ...
  ACMELogsBucket:
    Type: AWS::S3::Bucket
    # ...
  ACMEFooCreatedTopic:
    Type: AWS::SNS::Topic
    Properties:
      TopicName: ACMEFooCreated
  ACMEBarCreatedTopic:
    Type: AWS::SNS::Topic
    Properties:
      TopicName: ACMEBarCreated
  ACMEBazCreatedTopic:
    Type: AWS::SNS::Topic
    Properties:
      TopicName: ACMEBazCreated
  ACMEQuuxCreatedTopic:
    Type: AWS::SNS::Topic
    Properties:
      TopicName: ACMEQuuxCreated
  # And tens of similar topics

According to YAML (and Dhall's YAML converter) Resources is an object, that is, a structure type with fields of ACMEFooCreatedTopic, ACMEBarCreatedTopic and so on. This is no good if we want to generate many similar resources.

Fortunately, the YAML converter automatically converts arrays of objects with mapKey and mapValue keys into objects with keys that we need, for example:

[ { mapKey = "one"
  , mapValue = 1
  }
, { mapKey = "two"
  , mapValue = 2
  }
]

gets converted to:

two: 2
one: 1

But it turns out this isn't the main problem.

The end of gradual typing

So we have a lot of similar topics which we can represent as:

{ name: Text
, resourceName: Text
}

and put into the resource array using a trivial function converting this representation to an SNS resource.

And here the gradual typing stops and problems start:

In Dhall the function argument types must be specified explicitly.

Therefore, we must write:

let Topic = { name: Text
            , resourceName: Text
            } in
let topicResource = \(topic: Topic) -> { mapKey = topic.topicResourceName
                                       , mapValue = { `Type` = "AWS::SNS::Topic"
                                                    , Properties = { TopicName = topic.name }
                                                    }
                                       } in

But at least the Topic type isn't that large? We just need to define the topics and convert them to a resource list using map?

In Dhall polymorphic functions accept types parametrising them explicitly.

Therefore, map must always have four arguments. All of those must be specified explicitly.

let Resource = { mapKey: Text
               , mapValue: { `Type`: Text
                           , Properties: { TopicName: Text }
                           }
               } in
map Topic Resource topicResource topics

Okay, but at least the resource corresponding to topics isn't too complex?

Unfortunately there are things other than topics in the resource list, and all their possible properties have to be included in Resource, which in addition is going to become a sum type:

let Resource = < Topic: { `Type`: Text
                        , Properties: { TopicName: Text }
                        }
               | Bucket: { `Type`: Text} |
                         , DependsOn: List Text
                         , Properties: { AccessControl: Text
                                       , BucketName: Text
                                       , LoggingConfiguration: { DestinationBucketName: Text
                                                               , LogFilePrefix: Text
                                                               }
                                       , VersioningConfiguration: Optional { Status: Text }
                                       , LifecycleConfiguration: Optional { Rules: List LifecycleRule }
                                       -- ...
                                       }
               > in
let topicResource = \(topic: Topic) -> Resource.Topic { ... }

Note that in real CloudFormation a !Ref SomeOtherResource, !Sub and so on can appear at any point. Because our template uses those, we had to replace Text with AWSValue, also declared as a sum, in all those places, and call AWSValue.Ref and AWSValue.Sub where needed (and AWSValue.Text everywhere else).

Here my attempt to translate the whole template to Dhall ended. Defining all the possible CloudFormation parameters that are used in our template and add new ones every time one is required is too costly.

Furthermore, two other problems not as critical:

All record fields are mandatory.

To simulate an optional field, it must be declared as, for example, Optional Text in the record constructor, and every concrete value must still have it defined, only the YAML converter will remove it (with the --omitNull option).

let Foo = { Always: Text
          , Sometimes: Optional Text } in
{ Always = "there"
, Sometimes = None Text } : Foo

(Since Optional is parametrised by the inner type, one must specify Text explicitly).

The official recommendation when there is a lot of optional fields is to define an "empty" record and update it:

let defaultFoo = { Always = ""
                 , Sometimes = None Text
                 } in
defaultFoo { Always = "There" }

let introduces only one variable, and several nested let are converted by dhall-format to a ladder JavaScript would be proud of.

Here is the start of the file describing our template after running it through dhall-format:

let map = https://prelude.dhall-lang.org/List/map

in  let Notification =
		  ./notification.dhall

	in  let Export =
			  ./export.dhall

		in  let Resource =
				  ./resource.dhall

			in  let exports =
					  ./exports.dhall

				in  let wholeBucket =
						  ./wholeBucket.dhall

					in  let LifecycleRule =
							  ./lifecycleRule.dhall

						in  let mkParam =
									λ(desc : Text)
								  → { `Type` = "String", Description = desc }

							in  { AWSTemplateFormatVersion =
									"2010-09-09"
                                -- ...
                                }

What to do next?

If you need to generate CloudFormation using Dhall, most likely you'll have to take Amazonka or something like it as a base and generate all the CloudFormation types automatically, specifying precise types wherever needed.

How to change Dhall to support gradual typing?

If we take as granted the fact that the type parameters have to be passed explicitly, maybe we can allow passing _ as the type or a part of it, and infer the missing type during the unification process, something like:

let Resource = { `Type`: Text
               , Properties: _ }

A more radical, but not as elegant a solution - introduce an explicit Dump type which any value can be converted to, but so that the values of type Dump support no operations on them. The only thing that can be done is transform them into YAML/JSON, the result of which would be the transformation of the original value. Then the CloudFormation functions will return Dump, which we'll be able to combine but not modify, and, for example, the list of resources would look like:

List { mapKey: Text
     , mapValue: Dump }

This approach doesn't guarantee the correctness of the values but doesn't require specifying the Resource type fully. However, the type can be refined, for example, by writing instead of Dump:

let BucketProperties = { BucketName: Text } //\\ Dump

getting a type with one guaranteed field and arbitrary other.

Goodreads

Goodreads recommendations are too good and awful at the same time: they showed me a series that I'm watching already (series based on books), but... the last book in the series. With spoilers.

Wait for a while now, not visiting until I finish it.

Olympic import sorting

Many programming languages recommend the following order of imports: standard library first, then all the third party modules sorted alphabetically, and finally, program's own modules.

Here's an example from Python:

import os
import sys

from django.db import models
import django.views

from my_package import my_module

If you look closely, this is the order the countries come out during the Olympic games opening ceremony!

Greece
All the other countries alphabetically
Host country

Let's call this import order "olympic sorting".

GHC.Generics talk

Slides from the YOW! Lambda Jam 2017 about GHC.Generics.

Name levels

At first, programs have objects from the problem domain - be it Song, Fruit or House.

Of course, you need abstractions - Object, Model, Class (and perhaps Metaclass) or Message. The customers aren't interested in these, but they are real words at least.

And when that is not enough, enter concepts named after publication authors: Kan extension, Yoneda lemma and Lie group. They don't have corresponding words in the language, so perhaps even the authors don't have hopes of their wide usage.

P.S. Of course, you can just name your types K1 and M1.

Review: Working Effectively with Legacy Code

Thanks to the library of the company I work in for the ability to request books. "Working Effectively" is more than 10 years old, but the title looked promising.

In essense, all the book advocates is divide (the code into separate pieces), write tests and conquer (make changes). Perhaps about 70% is devoted to the ways of how to achieve that, with examples in Java. So knowledge of Java or at least the OO paradigm is a must before reading (and C and C++ desirable).

Of course, in the time being the programming languages and tools have changed, and it was interesting to compare whether the described techniques are now simpler, obsolete or new ones are needed.

On one hand, Java, being a statically typed language with rather strict decisions, makes one resort to building a complex web of interfaces, overridden methods and objects in the hard cases. To my knowledge, it now has tools that allow circumventing the inheritance, method visibility and other restrictions when testing. But it isn't worthwile in all cases - improving the code structure, even just to run the tests, is beneficial for changes later.

I won't discuss the C++ examples at all, because all they are is cheating on the compiler and linker. A lot of places use macros - I wonder if the author ever got a project where the macros themselves needed testing? However, good to read for the general knowledge.

The place where I envy the author is reliance on the compile stage errors. Thanks to the strict typing, practically the only place where a runtime error is possible is the implicit float to int conversion, and this is pointed out several times in different chapters. On the other hand, the author is heavily exploiting null, suggesting passing if an object of the needed type can't be constructed and hope it works.

There are no higher order functions in the book - perhaps Java didn't have delegates at the time. In C++, of course, one can make callable objects and even custom method tables, but, again, doing it just for testing is an overkill.

An interesting technique I saw that doesn't depend on particular language or paradigm is "scratch refactoring". Having saved the current state in a version control system, you start changing everything you can, however you want, paying no attention to the tests. If the changes touch several areas, that's still fine, continue as you please. And when there is finally a satisfactory result... remove all the changes, revert to the saved version and start again, with tests. The refactoring itself won't advance very far, but the experiment will give you an understanding of the direction to take.

One more note is about the architecture. Very often its understanding is different between team members, especially if the project has an architect detached from the real life. For this not to happen, the author suggests periodically explaining the system or its components' functionality, assuming no prior knowledge, in a limited number of sentences. Thus all the participants get an idealized version of the architecture, and subconsiously strive to support and enhance it instead of making random changes. I wonder whether these idealized descriptions can go into the modules' documentation, or it is worthwile making the team members make them up from scratch every time?

The last thing I want to point out is that the author focuses on writing unit tests. There is a good criterion for them - they are the tests that can run faster than 0.1 seconds. This resolves the contradiction in the "testing in isolation" definition - it's obvious that the + operator, when it is used in the tested method, doesn't need a mock, but an external service call, like a database, and similar must be excluded. It is a pity that Django, for example, practically imposes the reverse approach, where there are extensive tools for using ORM when testing and almost none for replacing it.

General recommendation: for working with object-oriented code, worth a read. Techniques specific to a particular language can be leafed through.

HackerX Melbourne

I've got an invitation to HackerX. I didn't think it was particularly worthwhile after I've got a repeat notification seeing as I didn't register on the first one, but decided to check it out. By the way, I don't know where did they get my address from - during the registration I was asked for LinkedIn and GitHub so perhaps not from there.

The event itself was crowded. Perhaps 50 people in the room, and that's considering a lot of name tags didn't get picked up. About 15 companies, the only familiar name was Telstra.

But that's not the main thing. During the registration they let me choose my programming languages, asked about GitHub and LinkedIn, but after the introduction they just let everyone take a spot at one of the employers' desks at will (5 minutes to a company). So, as I understand it, there was no matching of companies to developers based on interest - just a career fair.

Best REPL

What's 1 + 2? Let's ask Elm:

---- elm-repl 0.17.1 -----------------------------------------------------------
 :help for help, :exit to exit, more at <https://github.com/elm-lang/elm-repl>
--------------------------------------------------------------------------------
> 1+2
Error: The following HTTP request failed.
<http://package.elm-lang.org/all-packages?elm-package-version=0.17.1&since=2016-10-24%2008%3A52%3A34.764973%20UTC>

FailedConnectionException2 "package.elm-lang.org" 80 False getAddrInfo: does not exist (Name or service not known)

The most interesting result of any REPL I've ever seen. Will try to convince the workplace to use it.

XMonad via Stack

XMonad can be installed via Stackage. But it's not enough to install, it needs to compile the user configuration - xmonad.hs. Being installed through stack install, XMonad tries to call the system GHC, which expectedly can't find the required libraries.

Ideally, XMonad could understand that it's installed through Stack, and called the corresponding compiler. There are precedents - HLint, for example, loads the required libraries for checking the code if it finds stack.yaml. But meanwhile it can be done manually.

XMonad calls GHC to compile the configuration, so one must first point to it from a wrapper script ~/bin/xmonad:

#!/bin/sh
export PREVPATH=$PATH
GHC_PACKAGE_PATH=$(stack exec env | grep GHC_PACKAGE_PATH | sed s/.\\+=//g 2>/dev/null)
export GHC_PACKAGE_PATH
PATH=$(stack exec env | grep ^PATH | sed s/.\\+=//g 2>/dev/null)
export PATH
exec $HOME/.local/bin/xmonad "$@"

This script has to be put in a directory preceding the one used by Stack - ~/.local/bin - in $PATH.

Now XMonad will call GHC installed through Stack, compile the configuration and launch the result. But all the other programs launched from within it will inherit the same settings, which confuses, among others, Stack itself. To fix this, we need to reinstate the system settings after the launch. Thus, in xmonad.hs:

main = do
    unsetEnv "GHC_PACKAGE_PATH"
    getEnv "PREVPATH" >>= setEnv "PATH"
    unsetEnv "PREVPATH"
    -- ...

The full configuration can be found in my repository:

Update

XMonad 0.13, or any version with e159ec3, supports a custom user build script that should make most of the above unnecessary.

Wintergatan

I liked what Kraftwerk did in the start of their career: unique, custom made instruments producing sound that wasn't heard before. Clips and concerts where you couldn't tell real people from robots. But 40 years passed, synthesizers and computers are used by more people than know what those mean, and Kraftwerk, unfortunately, have done nothing new.

Enter Wintergatan: Marble Machine, Starmachine2000 and Sommarfågel. It's not a perfec comparison, but here it is, new sound and self-made instruments. No grandiose ideas yet, but let's see what comes next...

Koterpillar

The end of gradual typing

What to do next?

Update