Skip to content

image processing

These are the stories that have been posted to the image processing category.

Image Processing as Sets of Transformations


Published to Rick Minerich's Development Wonderland by Richard Minerich April 28, 2009 00:21

In the image processing world, like most computational problems, we often think our work is composed of only two basic ideas: representation and transformation.  Of course, one may have many layers of both representations of transformations and transformations of representations which can make things appear quite complex at times.

However,  the problem is much more simple than it appears.  This is because a representation can be considered as a transformation from a zero or identity state.  Thus, in writing a symbolic language for image processing, we are left with only a single idea to consider:  transformations.  By composting layers of transformations we can apply image processing techniques in way which is not only bidirectional and platform agnostic but also comes along with a host of other benefits.

 

Let us consider a simplified example of processing an image:

1) We read in a file (representation) and use a codec (transformation) to convert it into a format understood by our API (representation).

2) We then perform some type of algorithm on that data (transformation) which results in some type of output (representation).

3) Finally, via another codec (transformation), another file is saved to disk (representation).

 

In most cases there are a great number of intermediate representations.  Each is a full copy of the previous iteration with whatever changes have been so far applied.  Essentially, the same information is copied over and over again in memory.  We do allow for some kinds of in-place processing, however, this is bad as when the operation has been completed, the previous representation has been destroyed.

 

Instead, what if we batched up sets of transformations?  This could have many benefits:

1) The most obvious benefit is that of parallelization.  Even at the simplest level of functional composition, these transformations could be handed off to a cluster for asynchronous processing or saved for a later batch processing job.

2) With an intermediate symbolic transformation language, processing algorithms could potentially be combined and reduced to produce a single transformation out of many.  This would significantly reduce the processing overhead as well as the number of intermediate memory representations.

3) An intermediate symbolic language which encompassed both codec and processing may make it possible to push the processing transformation through the codec transformation and in so doing no longer need to have any intermediate memory representation.  This could provide significant memory and processing speed time benefit. 

4) The intermediate symbolic language could be saved into the files themselves thus removing the need for the codec to be present on the end machine.  Admittedly, the user would also need the image language interpreter.

5) Instead of applying simple image processing algorithms to an image, the symbolic representation could be appended to the end of the file.  This would be quite similar to layers in practice.  In this way it would be possible to view the image at all stages of transformation.

6) For large or proprietary transformations, the representation could be kept on the internet and either be downloaded or, in the case where the owner did not want to expose their algorithm, a flattened representation could be sent out and a processing delta could be sent back.

 

Conclusion

Of course, when I speak of data I don’t only mean the image itself.  This technique could also be applied to many classes of data or algorithm.  Most notably for us, image metadata.

My initial goal is to build a basic codec representation along with some simple transformations.  Currently, I am researching bidirectional, reversible and declarative languages as examples.  With F# as a base language I believe it will be possible to build something portable to other ML variants.

Image Processing as Sets of Transformations


Published to Rick Minerich's Development Wonderland by Richard Minerich April 28, 2009 00:21

In the image processing world, like most computational problems, we often think our work is composed of only two basic ideas: representation and transformation.  Of course, one may have many layers of both representations of transformations and transformations of representations which can make things appear quite complex at times.

However,  the problem is much more simple than it appears.  This is because a representation can be considered as a transformation from a zero or identity state.  Thus, in writing a symbolic language for image processing, we are left with only a single idea to consider:  transformations.  By composting layers of transformations we can apply image processing techniques in way which is not only bidirectional and platform agnostic but also comes along with a host of other benefits.

 

Let us consider a simplified example of processing an image:

1) We read in a file (representation) and use a codec (transformation) to convert it into a format understood by our API (representation).

2) We then perform some type of algorithm on that data (transformation) which results in some type of output (representation).

3) Finally, via another codec (transformation), another file is saved to disk (representation).

 

In most cases there are a great number of intermediate representations.  Each is a full copy of the previous iteration with whatever changes have been so far applied.  Essentially, the same information is copied over and over again in memory.  We do allow for some kinds of in-place processing, however, this is bad as when the operation has been completed, the previous representation has been destroyed.

 

Instead, what if we batched up sets of transformations?  This could have many benefits:

1) The most obvious benefit is that of parallelization.  Even at the simplest level of functional composition, these transformations could be handed off to a cluster for asynchronous processing or saved for a later batch processing job.

2) With an intermediate symbolic transformation language, processing algorithms could potentially be combined and reduced to produce a single transformation out of many.  This would significantly reduce the processing overhead as well as the number of intermediate memory representations.

3) An intermediate symbolic language which encompassed both codec and processing may make it possible to push the processing transformation through the codec transformation and in so doing no longer need to have any intermediate memory representation.  This could provide significant memory and processing speed time benefit. 

4) The intermediate symbolic language could be saved into the files themselves thus removing the need for the codec to be present on the end machine.  Admittedly, the user would also need the image language interpreter.

5) Instead of applying simple image processing algorithms to an image, the symbolic representation could be appended to the end of the file.  This would be quite similar to layers in practice.  In this way it would be possible to view the image at all stages of transformation.

6) For large or proprietary transformations, the representation could be kept on the internet and either be downloaded or, in the case where the owner did not want to expose their algorithm, a flattened representation could be sent out and a processing delta could be sent back.

 

Conclusion

Of course, when I speak of data I don’t only mean the image itself.  This technique could also be applied to many classes of data or algorithm.  Most notably for us, image metadata.

My initial goal is to build a basic codec representation along with some simple transformations.  Currently, I am researching bidirectional, reversible and declarative languages as examples.  With F# as a base language I believe it will be possible to build something portable to other ML variants.

Image Processing in F#: From Image File to Array


Published to Rick Minerich's Development Wonderland by Richard Minerich September 03, 2009 18:09

F# has fantastic array manipulation functionality.  To leverage this functionality for some very elegant image processing, it is first necessary to to convert image files into a byte arrays.  Unfortunately, this process is not as simple as one might hope. 

It’s a dark path of missing documentation, incorrect code samples and some ugly .NET interop.  With all of the other difficulties involved, I want to otherwise keep things simple and so will make the following assumptions:

  1. The image file will be properly handled by .NET’s image codecs (it’s known to have some issues with Tiffs in particular)
  2. The Image format is 24 bits per pixel BGR
  3. The end user will handle exceptions

By far the most difficult things in writing this small sample was that it seemed every BitmapData implementation I ran into was completely broken.  In fact, this was the case for even the .NET Framework SDK sample code.  For each example I tried the following two tests succeeded:

[<Fact>]

member x.first_matches_GetPixel() =

  let pixel = bmp24Bgr.GetPixel(0,0)

  Assert.Equal( pixel.B, array24Bgr.[0] )

  Assert.Equal( pixel.G, array24Bgr.[1] )

  Assert.Equal( pixel.R, array24Bgr.[2] )

 

[<Fact>]

member x.last_on_first_scanline_matches_GetPixel() =

  let pixel = bmp24Bgr.GetPixel( bmp24Bgr.Width - 1 , 0 )

  let offset = (bmp24Bgr.Width - 1) * 3

  Assert.Equal( pixel.B, array24Bgr.[offset] )

  Assert.Equal( pixel.G, array24Bgr.[offset + 1] )

  Assert.Equal( pixel.R, array24Bgr.[offset + 2] )

Unfortunately, the following two tests would fail:

[<Fact>]

member x.first_on_last_scanline_matches_GetPixel() =

  let pixel = bmp24Bgr.GetPixel( 0 , bmp24Bgr.Height - 1 )

  let offset = bmp24Bgr.Width * (bmp24Bgr.Height - 1) * 3

  Assert.Equal( pixel.B, array24Bgr.[offset] )

  Assert.Equal( pixel.G, array24Bgr.[offset + 1] )

  Assert.Equal( pixel.R, array24Bgr.[offset + 2] )

 

[<Fact>]

member x.last_matches_GetPixel() =

  let pixel = bmp24Bgr.GetPixel( bmp24Bgr.Width - 1,

                                 bmp24Bgr.Height - 1 )

  let offset = (bmp24Bgr.Width * bmp24Bgr.Height * 3) - 3

  Assert.Equal( pixel.B, array24Bgr.[offset ] )

  Assert.Equal( pixel.G, array24Bgr.[offset + 1] )

  Assert.Equal( pixel.R, array24Bgr.[offset + 2] )

On closer inspection it seemed something was turning everything after the first scanline into garbage.

Well, it turns out that each scanline in BitmapData’s allocated memory block can have some padding on the end.  This mean’s you can’t simply block-copy the memory into your array as is done in so many examples.  Instead, it is necessary to iterate over each scanline and copy only up to the padding.  A fact missing from everywhere but a small corner of the .NET Framework BitmapData documentation.

A big thanks to Bob Powell for his article on exactly how the BitmapData class works.  Once I saw his diagram, I knew exactly what was going on and reworked my code to Marshal the data out of each scanline separately, minus the padding.

open System.Drawing

open System.Drawing.Imaging

open System.Runtime.InteropServices

 

let getBytesFromBitmap (bytesPerPixel: int) (bmp: Bitmap) =

  let imgRect = new Rectangle(0, 0, bmp.Width, bmp.Height)

  let bmpBits = bmp.LockBits(imgRect, ImageLockMode.ReadOnly,

                             bmp.PixelFormat)

  try

    let pixelBytes = bmp.Height * bmp.Width * bytesPerPixel

    let byteArray: byte[] = Array.zeroCreate pixelBytes

 

    let scanlineBytes = bmp.Width * bytesPerPixel

    let stride: nativeint = nativeint bmpBits.Stride

    for i in [ 0 .. bmp.Height - 1 ] do

        let bmpOffset = stride * nativeint i

        let arrayOffset = scanlineBytes * int i

        Marshal.Copy( bmpBits.Scan0 + bmpOffset, byteArray,

                      arrayOffset, scanlineBytes)

    byteArray

  finally       

    bmp.UnlockBits( bmpBits )

This code is not as pretty or efficient as it might be, but at least it works and is fairly safe.  I’d love to clean this up and so I encourage you to leave critiques and/or example code in the comments section.

Also, I’ve made my Visual Studio 2008 solution containing the above code and a few small extras available.  Let me say again that any comments or critiques are welcome.  I hope that in posting this I’ve saved you the some of the pain I had in writing it.

Ninja Edit:  The idea of this function exiting with still locked bits was really bothering me and so I put everything after LockBits into a try-finally.

F# Discoveries This Week 10/04/2009


Published to Rick Minerich's Development Wonderland by Richard Minerich October 05, 2009 01:57

I’m back from my three week vacation and am just about buried in fascinating functional programming links.  I’ve managed to get through most of them and have selected the best of these for this very special welcome back edition of Discoveries This Week.

 

CUFP (Commercial Users of Functional Programming) 2009 Videos

“Functional languages have been under academic development for over 25 years, and remain fertile ground for programming language research. Recently, however, developers in industrial, governmental, and open-source projects have begun to use functional programming successfully in practical applications. In these settings, functional programming has often provided dramatic leverage, including whole new ways of thinking about the original problem.”

 

M<’a> Lib (F# and C# LINQ) Monads Library

“Unified collection of Monads (M, unit, *) implemented in the Microsoft F# Language.”

Implemented so far: Identity, Maybe, State and List with many more to come.  The project lead is actively looking for help. 

 

Flying Frog’s F# vs OCaml: Image Processing

“Fortunately, this inefficiency can be overcome by using Just-In-Time (JIT) compilation instead of static compilation and partially specializing polymorphism away before JIT compilation. This is the intended design for polymorphism in HLVM and the inspiration was drawn from Microsoft's excellent implementation of the CLR.

Consequently, the equivalent F# program is 100× faster than the OCaml.”

This JIT optimization speed enhancement is astounding by any measure.

 

HPC Development Using F#

“This white paper introduces the F# programming language in the context of technical computing, and demonstrates how F# can be used for both shared-memory parallel programming using the Task Parallel Library, and distributed parallel programming using a Windows HPC Server 2008-based cluster and the Message Passing Interface (MPI).”

 

Matthew Podwysocki’s Generically Constraining F# Pars One, Two and Three

“Generic constraints inside .NET has always been a fun enterprise, especially given how C# handles them  There has been some discussion on Jon Skeet’s blog about the fact that C# does not allow for generic constraints referring to a number of types. […] However, as Jon correctly points out, this is indeed supported by the CLR directly.  In fact, with our knowledge of F# constraints, we can write this exact function in F# without any such issue.”

F# - Designing Functional Interfaces for Pipelining


Published to Rick Minerich's Development Wonderland by Richard Minerich November 12, 2009 18:58

So you have an Object Oriented library but yet want to be able to use F#’s functional pipelining feature to design expressive data processing workflows.  How do you go about it?

First, lets set a goal.  Some low hanging fruit so to speak.  Let’s pretend we have a set of images we want to load, resize, intensify and save as Png files for later use in an online image gallery.

 

Image Processing In C#

For reference, this would look something like the following using our DotImage toolkit in C#:

public void ProcessImage(string fromfile, string tofile)

{

    ImageCommand[] commands = new ImageCommand[] {

        new ChangePixelFormatCommand( PixelFormat.Pixel8bppIndexed ),

        new ResampleCommand( new Size( 800, 600) ),

        new IntensifyCommand( 50.0 )

    };

 

    AtalaImage currentImage =  new AtalaImage(fromfile);

    foreach (var command in commands)

    {

        currentImage = command.Apply(currentImage).Image;

    }

 

    currentImage.Save(tofile, new PngEncoder(), null);  

}

It’s a testament to the skill of our Senior Architect that this task is as simple as it is in C#. 

Note: our image representation is IDisposable and for optimal performance should immediately be disposed when done being used.  I’ll be covering how to leverage F#’s type system to handle this in a later post.

 

Now, in F#

In comparison, this is how I envision this same process using F#’s functional pipelining style:

let processImage infile outfile =

  Image.fromFile infile

  |> Image.changePixelFormat Image.PixelFormat.Pixel8bppIndexed

  |> Image.resample 800 600

  |> Image.intensify 50.0

  |> Image.toPngFile outfile

These pipelined functions are so easy on the eyes.  It’s immediately obvious what’s going on here.  Unfortunately, it can be rather difficult to use pipelining with non function constructs.

 

Wrapping an Object Oriented Library for Pipelining

To bring this seamless integration with F#, we must first wrap these Object Oriented classes so that they can be used in a functional way.  This is a rather simple task:

namespace Atalasoft.FSharp

 

open System.Drawing

 

open Atalasoft.Imaging

open Atalasoft.Imaging.Codec

open Atalasoft.Imaging.ImageProcessing

open Atalasoft.Imaging.ImageProcessing.Filters

 

module Image =

 

  type image = Atalasoft.Imaging.AtalaImage

  type PixelFormat = Atalasoft.Imaging.PixelFormat

 

  let fromFile (filename: string) =

    new image( filename )

 

  let toPngFile (filename: string) (img: image) =

    img.Save( filename, new PngEncoder(), null)  |> ignore

 

  let resample (width: int) (height: int) (img: image) =

    let newSize = new Size( width, height ) in

      let cmd = new ResampleCommand( newSize ) in

        cmd.Apply(img).Image   

 

  let intensify (magnitude: double) (img: image) =

    let cmd = new IntensifyCommand( magnitude ) in

      cmd.Apply(img).Image

 

  let changePixelFormat (pf: PixelFormat) (img: image) =

    let changer = new AtalaPixelFormatChanger() in

      changer.ChangePixelFormat(img, pf, null)

 

So, in this way, we can define modules which will hide our object oriented interfaces.  However, let’s take a deeper look.  There are a some important things to keep in mind when designing functions for pipelining. 

First, observe that the module definition encapsulates all of our pipelining functions and mandates how they will be accessed later.  It is good design practice to define all pipelining functions for the same type within one module.  This module should have the same name as the type but with the first letter capitalized.

Second, notice that we use type abbreviations to create local versions of the AtalaImage and PixelFormat types.  This makes our library code easier to read and allows us to use the F# lowercase type naming style.   Even more importantly, by defining all exposed types in this way the consumer of this module will not need to open any namespaces from the assemblies we are wrapping.

Third, see how image is the last parameter to each function?  To be able to pipeline into a function its final parameter must be of the to-be-pipelined type.

Note: This is not strictly true.  If you wish to pipeline into a function and then return a function with that value curried in, it can have additional parameters which will be filled in later. However, that’s a topic best left for another time.

Finally, note that the return type of each of the intermediate image processing functions is also image.  This ensures that the output can be pipelined directly into the next function.

 

Our F# Wrapper in Action

Let’s highlight the contents of this fsx file, hit alt-enter, and give it a whirl in the F# Interactive Window:

#r "Atalasoft.dotImage"

#r "Atalasoft.dotImage.Lib"

#r "Atalasoft.Shared"

 

#load "Image.fs"

 

open Atalasoft.FSharp

 

let processImage infile outfile =

  Image.fromFile infile

  |> Image.changePixelFormat Image.PixelFormat.Pixel32bppBgr

  |> Image.resample 400 300

  |> Image.intensify 70.0

  |> Image.toPngFile outfile

 

Finally, in F# Interactive we can simply type:

  > processImage @"C:\temp\Water lilies.jpg" @"C:\temp\Water lilies.png";;
  val it : unit = ()

and we have our processed image:

 Water lilies

 

Conclusion

Of course, this implementation leaves much to be desired.  It hides the vast majority of the available functionality in our underlying library.  Taken to it’s natural end it leaves us with the unfortunate task of wrapping our entire library one command at a time, which is hardly an appealing prospect.

What if you wish to leverage existing objects without wrapping each individually?  Well, we will explore that next time.

F# Discoveries This Week 03/12/2010


Published to Rick Minerich's Development Wonderland by Richard Minerich March 12, 2010 21:03

Tons this week.  Vladimir Matveev’s is my favorite new F# blogger with very well written data structure posts, Ashley Feniello continues his fantastic FScheme series, and Jomo Fisher posts some great Freebase and DGML examples.  That’s just the tip of the F# iceberg, do come inside.

 

Ashley Feniello’s FScheme Parts Twelve, Thirteen and Fourteen

The basic idea is to run a simulation by iterating a pure function from world state to world state. We’ll add a new ‘run’ primitive which will expect several user-defined functions to have been set up. The world state is initially produced by an ‘init’ function. Then every 30th of a second a ‘tick’ function is called to produce a new world state from the current state. Finally a ‘draw’ function will be called to render the world.

 

Luca Bolognese’s Updated Stock Prices, Divs and Splits Example

I’m working on a program to keep track of paired trades with trailing stops. I need to download stock prices, so I thought I might reuse some old code of mine. Here is the updated framework.

 

David Carlisle’s NAG F# Examples

NAG (Numerical Algorithms Group) is currently running a beta test of a NAG Library for .NET. One noticeable feature of the comments received so far is the relatively large number of users interfacing to the library from F# rather than C# or VB.NET.

 

Phillip Trelford’s The Associative Model of Data

But what if you wanted to extend the web store to have features like the online retailer Amazon, e.g. multiple sellers, recommendations, etc.? Answer: serious table and relationship proliferation. Enter an alternative model: the Associative model of data, a dynamic model where data is defined simply as items and links.

 

Jon Harrop’s F# vs Unmanaged C++ for Parallel Numerics

We obtained a surprising performance result when comparing optimized parallel ray tracers written in F# and C++ recently. The following two programs render the same highly complex scenes containing over a million objects. Surprisingly, the 136-line managed F# program runs slightly faster at 17s than the 168-line unmanaged C++ which takes 18s.

 

Vladimir Matveev’s F# and Iron Pyton

Today’s post will be devoted to various ways of integration between Iron Python and F#. I’ll try to skip the details of DLR configuration, because this is vast topic that worth separate post (maybe even a few posts). Instead I’ll focus on questions of integration.

 

Vladimir Matveev’s Data Structures: Finger Tree (Part 1)

What we’ll try to do in this post is to create the structure (based on 2-3 trees) with following characteristics.  Immutable (modification returns new instance of structure with changes applied),  Enqueue/Dequeue both in start and end in amortized constant time, and Concatenation support.

 

Vladimir Matveev’s Data Structures: 2-3 Tree

There are many special types of trees that perform insert/remove operation in intelligent way ensuring that result tree is small but branchy :). This trees are called self-balanced, most well-known of them are AVL trees, Red-black trees, 2-3 trees. This post is dedicated to the latter ones.

 

Vladimir Matveev’s Overview of F# Async Module and Event-based Async Pattern in F#.

This post I’d like to dedicate to reviewing functionality of Async module – creating and manipulating async computations.

 

Luis Diego Fallas’s Basic Image Processing Operations with F#

The previous post presented a way to access the image data from the Webcam using DirectShow.Net and F#. We can manipulate this data to do some basic image processing operations with it.

 

Jomo Fisher’s Extend your F# program with MEF and MEF in F# Scripts

The Managed Extensibility Framework is an interesting new technology in .NET 4.0.  This is a simple example in F#. This code sets up MEF hosting and asks for all extensions in the c:\extensions folder.

 

Jomo Fisher’s Neat Samples: F#, Freebase, DGML

I recently posted about the freebase web service here. This sample reads biological classifications and renders them in DGML. The result is a huge graph, here’s a little piece of it…

 

Jomo Fisher’s Neat Samples: F# and Bing API

Here’s another F# web service sample. This one uses the Bing Phone API to do a query. This time the code uses Xml instead of JSON and XmlDocument instead of a DataContract deserializer. This is pretty much a straight transliteration of one of the Bing SDK samples.

 

Jomo Fisher’s F# and Freebase

The web service at Freebase.com lets you access all sorts of structured data from a web service. Here’s a sample that shows you how to access this data from F#. It uses DataContract and the JSON serializer. The code below reads and prints the elements of the periodic table.

 

Julien Ortin’s BitTorrent in F# series: I/O Operations and Bitfield

One important thing is that a BitTorrent transfer is considered as a stream of pieces. So, if you have a 100-byte file, and a 400-byte one, and if the piece size is 200-byte long, the data from the piece need to be appropriately split (both when reading and when writing).  In this library, we use a reference to the AsyncWorker described on Don Syme’s blog.

 

Matt Moloney’s Drag and Drop using Rx and WPF in F#

I have recently been experimenting with combining Reactive X, WPF, and F# and have found the combination to be very palatable. I chose drag and drop as the test case because it is both non trivial and generally deeply stateful. The resulting Rx turns out to be one fifth the code of my original C#, much easier to read and has fewer errors.

 

Kean Walmsley’s Using jig from F# to create Spirograph patterns in AutoCAD

After my initial fooling around with turning AutoCAD into a Spirograph using F#, I decided to come back to this and bolt a jig on the front to make the act of making these objects more visual and discoverable.

 

Joh’s Thoughts about F# and Xbox games

I have been working for quite some time now on Asteroid Hunter. This has not left me much time for exploration with F#, but there is quite a bit a learned during the process anyway.

 

Richard Minerich’s Abstract Thoughts about F# Abstractions

My recent work on Professional F# 2.0 has left me thinking a lot about the nature of abstractions.

 

Oliver Strum’s Creating a Lazy Sequence of Directory Descendants

I thought these code examples all look rather verbose – in the case of Clojure because in that way rather typical for Java, the APIs are pretty verbose to use, and in the case of C# because of all the syntactic, well, ahem, necessities, as well as the fact that there’s no language feature for integrating nested sequences seamlessly. Keeping it nice and simple, in F# that example can look like this…

 

Tormod Fjeldskar’s Tail Recursion in C# and F#

Tail recursion is essential in functional languages like F#, where iterative solutions are often implemented using recursion.