Neil Mitchell's Blog (Haskell etc): December 2006

Saturday, December 30, 2006

End of year update

Since most people are doing a year summary, I thought I should do one. Then I remembered that I can barely remember what I did yesterday (for example today I went in to work only to discover it was a Saturday...). So instead I'm going to try and predict what I'll do in 2007, with the proviso that it's a complete pile of fiction, but some of it might actually happen.

PhD Catch

2007 should be the year I complete my PhD. Its certainly the last year I get any funding for my PhD. With aiming to complete my PhD, I also want to release a really good version of Catch - the Hsakell pattern match checker. I've been working away on this for over 2 years now, so would like something I can point at. I'm hoping Catch will be out by March, but I realise I said "a few months" at least 9 months ago now - it turns out the Halting Problem is quite hard...

Hoogle 4

I think Hoogle 4 has been knocking around for about 8 months now. I'd actually like to get this finished, and be the main version of the website. This brings just a complete polishing of Hoogle. Hoogle seems to be a well used tool, and hopefully Hoogle 4 will just make it better at everything and much more reliable.

Yhc

One project which I really want to polish off and put a release out of is Yhc. Yhc has lots of really exciting features, and is evolving pretty fast. The Yhc Javascript stuff is particularly interesting - the demos just keep getting better and better. The Yhc.Core side of things is looking really impressive. My optimisations stuff is starting to look really good. If I ever finish WinHaskell, Yhc might become a very nice choice for the default compiler.

Parsing

I really really want to finally implement my parser. Unfortunately, just taking a look at the things above this makes that rather unlikely. If you add in distractions like Hat, WinHaskell, WinHugs etc then it looks a lot less likely. As such I'm going to make the target to formalise my state machine algorithms and write a state machine library in Haskell which I can use as the basis for my parser.

I've tried to keep this list as short as possible, to actually give me a chance of doing some of it. We'll see if that happens.

Saturday, December 23, 2006

Hoogle Progress, HsMan features

About 2 months ago Frederik Eaton posted "HsMan" to the ghc list, a program that with one command line could look up the documentation for a function. Now Hoogle 4 has that feature :)

$ hoogle filepath combine /info /doc
System.FilePath.Version_0_09.combine :: FilePath -> FilePath -> FilePath

Combine two paths, if the right path isAbsolute, then it returns the second.

Posix: combine "/" "test" == "/test"
Posix: combine "home" "bob" == "home/bob"
Windows: combine "home" "bob" == "home\\bob"

http://www-users.cs.york.ac.uk/~ndm/projects/filepath/System-FilePath-Version_0_09.html#v:combine

What the above says is search the filepath module for the name combine, and when you find it, with the one that ranks best, display the haddock entry for it (info) and a link to further documentation (doc)

It was a reasonable amount of work to add, but there is a good reason for adding this feature, which will become clear when Hoogle 4 web version is relased :)

Friday, December 22, 2006

Hoogle 4 progress

I'm slowly working through the things I want for Hoogle 4, and its coming along quite nicely. A quick tour of the features I've already implemented:

Easier generation of Hoogle Databases

First create a Hoogle file using Haddock and Cabal:
> runhaskell Setup haddock --hoogle

Then convert it to a Hoogle binary database:
> hoogle /convert=base.txt

This will produce base.hoo, an optimised Hoogle database.

Faster text searches

Most searches are text searches, they used to linearly scan through the data, now they build a nice efficient index and hop directly to the relevant bit. This makes text searches over 100 times faster, and means they scale linearly with the number of results and the size of the text, rather than previously where they scaled with the size of the database.

As a result of having faster searches, I can now do multi-word searches, searching for two different words, and then matching between them. i.e. you can now search for "concat map" (or "map concat") and get concatMap returned.

Proper API

There is now a proper Hoogle API, which the Hoogle command line tool and web tool call out to. This will hopefully people can layer new things on top of Hoogle more easily.

Multiple package searching

You can now search within multiple packages at a time. For example "drop base filepath" will search both the base and filepath modules, as though they were one.

Restricting on module name

As requested you can now restrict searches by module name, for example "Prelude.map" and "map Prelude" will both do a search for the map function that resides in the Prelude.

Proper parsing

The parser is written in parsec now, much more reliable! And also much better error messages, hopefully this should make it easier to extend Hoogle in the future as well.

The Future

I broke quite a lot with the complete revamp of the code, so the idea is to fix up what I broke and try and push a new release out relatively soon.

Thursday, December 21, 2006

Will produce patches for PhD

I've just got back to my home, and am doing darcs pull's against all the repo's around to get my work up to date. I noticed Yhc has over 1000 patches, and then I pulled my primary PhD repo - and found I have 1626 patches! I have at least 100 more in a separate repo I use for one chunk of the code, and was working in CVS for over a year before moving to Haskell.

The Darcs repo for my PhD at home seemed dead, so I did a fresh darcs get, no checkpoints. It takes a while...

Friday, December 15, 2006

Advertising Haskell

A lot of posts on the mailing list at the moment seem to be about how we can promote Haskell to a greater audience. The idea of this post is to figured out why we might want to promote Haskell to a greater audience, which I think will give us a clue where to go. If Haskell was more popular then:

More Haskell jobs would emerge
People would find it easier to introduce Haskell in a work place
Less C in the world
More reliable software
Elegance and beauty

Of course, there are also downsides to Haskell being more popular:

Bigger communities aren't always as friendly
Commercialisation will loose some benefits of Haskell (open source etc.)
The volume of beginners will outweigh the experienced professionals
Managers might push Haskell for entirely unsuitable things

So, question, do we want Haskell to be more popular. At the moment I quite like the fact that when programming I can be much faster than other people, and have far fewer errors. It would however be nice if I could get a job doing Haskell at some point, rather than becoming a C monkey once more, which will be doubly painful after experiencing so much Haskell.

My personal view on how Haskell can be made more popular:

Remove features, generalise features, do not add features. I don't understand/know rank-n types, template Haskell, GADT's, impredictive polymorphism, arrows... Beginners don't want to know either! Getting the features right is good, which may mean adding features, but not necessarily.
Continue to be polite and helpful
Promote Haskell as a programming course
Promote Haskell to experienced programmers looking for something new
DO NOT promote Haskell in industry, if you force this on to people, they'll just discover we are not ready for this yet!

In my opinion, we do plenty of advertisments already. There is not nearly enough coding going on, Cabal could benefit from extra contributors, I know Yhc could etc. When the tools are ready the users will come, and then if they don't come naturally, we can prod them. Until that time the steady flow of new users is plenty.

Monday, December 11, 2006

bhc: Basic Haskell Compiler

I have been thinking Compilery thoughts for a while now, so I thought I'd jot them down. I am starting the "bhc" project here and now! The intention is that bhc will be a concept, and some thoughts, for at least another year - definately not anything as concrete as code!

The current Haskell compilery things are:

Hugs - fast to compile, slow to run
GHC - slow to compile, fast to run
Yhc/nhc - medium at both
Jhc - aiming for super slow to compile, super fast to run
...

What I would like to do is create a new front end for Yhc, taking the Yhc.Core and Yhc.ByteCode bits as they stand, and replacing the front end. I'd also like to base the entire code on libraries, rather than compiler passes, and make everything entirely reusable. I also want to aim for simplicity, elegance and readability.

So, first things first. Haskell is really the only language to implement this in. After that you have a few stages until you get to the Yhc.Core language:

Make (Cabal should do this)
Parsing
Name resolution
Type checking
Dictionary desugaring
Desugaring

There is only one way to write a parser, using my parsing system (not finished, barely even started). There is only one way to write a type checker, using my type checker generator (not started at all, barely even specified, not even a link, definately not proven!). Name resolution isn't that hard. Dictionary desugaring should use the special method of Catch (same as Jhc, I believe). Desugaring is trivial with Play classes (warning, if you follow that link, not finished!), I also want to have invertable desugaring, for analysis purposes. The parsing and typechecking would be standalone libaries.

Two of these things need to be written first, but thats part of the fun :)

Note that type checking does all the typey stuff, dictionary desugaring uses the types. Nothing else uses types, and in fact, I think this compiler should be untyped. (I know no one will agree with me, all I can say is that I think I'm right, but realise everyone thinks I'm wrong)

The next big decision is file formats: I would like to have a .bho (basic haskell object) file which corresponds to a single module, and a .bhl (basic haskell library) which is a whole library, and a .bhx (basic haskell executable) which is a whole program. Inside a .bho you would have (all items are optional):

Full original source code to the module
Desugarred source code, in Yhc.Core format
Bytecode, in Yhc.ByteCode format

A .bhl would have those three things, but linked together within a library. A .bhx would have all of that, including all the libaries, linked in as one.

I would also write an optimser, for whole program analysis, which took a .bhx, and produced an equivalent one. Probably also a compiler to C code, for super-fast execution.

So what in this compiler would be new?

Focus on libraries, Yhc.Core, Yhc.Parse, Yhc.ByteCode, Yhc.TypeCheck
Invertable desugaring
Extensive use of the Play class
Better library story (one library, one file)
Standalone crossplatform executables
Fast whole program analysis
Brand new parsing system
Untyped Core language (compared to other optimising compilers)
Simple

So, who wants to have some fun?

Tuesday, December 05, 2006

Generalised Haskell

In the past few days I've been working on a generalised version of Haskell, designed for program optimisation and transformation. I've also implemented the start of a Haskell optimiser, but in about 100 lines of Haskell, using this generalised form of Haskell.

Generalised Haskell is very much like a normal Haskell Core language, with less restrictions in pattern matching. In normal Haskell you would write "map" as:

> map f [] = []
> map f (x:xs) = f x : map f xs

Which would be compiled into:

> map f x = case x of
> ____ [] -> []
> ____ (x:xs) -> f x : map f xs

The rule for pattern matching is that you take the first rule that matches. Generalised Haskell relaxes this slightly to take any one rule that matches, but not necessarily in order. The second relaxation is that when pattern matching on the left hand side, functions can appear as patterns.

For example, you can now define:

> map id x = x
> map f x= case x of
> ____ [] -> []
> ____ (x:xs) -> f x : map f xs

Note that the "map id" case looks very much like GHC's Rules feature. The advantage for optimisation is that many activities can be encoding using these pattern matches - arity raising, specialisation, deforestation.

I am still working on a tool that uses this technique for optimisation, but initial results look quite promising, while the code remains impressively short.

Neil Mitchell's Blog (Haskell etc)