Image Analysis for Biologist’s Microscopy Images

January 28, 2010

The paper is:

Ljosa V, Carpenter AE (2009) Introduction to the Quantitative Analysis of Two-Dimensional Fluorescence Microscopy Images for Cell-Based Screening. PLoS Comput Biol 5(12): e1000603. doi:10.1371/
journal.pcbi.1000603

And being a PLoS journal there is an online version available. Yay for open access.

This paper is a tutorial and whilst I’m neither a biologist nor an image analyst, I know a bit about both and I found the level to be just about right for me. I think the intended readership is those overworked postdocs who are just about to design the protocol for a 10,000 slide experiment. Rather than attempting a comprehensive overview they refer to their primary example of “a cell-based fluorescence microscopy assay for DNA-damage regulators”. In other words, they took pictures of cells and counted the number of places where the DNA was damaged. In laying the groundwork they give a good number of motivating examples and also what looks to be a good selection of more comprehensive reviews and further reading.

I find their example quite good and I expect that at least in broad overview the image pipeline they illustrate and the particular techniques they discuss will be applicable not just to different areas of biological assay but image analysis more generally. They discuss quite a few image analysis techniques and their relevance to biology. For example, the importance of correcting for uneven illumination (by the microscope) is an effect that might be barely noticeable to the human eye but which can disrupt image processing algorithms.

There are two things I’m surprised they do not mention: non-linear intensity recording, and spectrally selective filters. By non-linear intensity recording I mean the fact that many image formats do not record a linear representation of light intensity, but instead gamma correct it first. Gamma correction is incredibly useful for recording images intended to be viewed by humans but may interfere with image processing algorithms. Who knows what a proprietary microscope does, but let’s hope it’s well documented. Incidentally, I do wonder if this gamma oblivious attitude is responsible for their comment that “working with the logarithm of the intensities is often helpful because it can reduce the skewness of the intensity data” (because linear intensities and gamma corrected intensities differ in their logarithms only by a constant factor (the exponent used for gamma correction)). The right place to discuss this would be in box 2, just underneath “Image file bit depth”.

A spectrally selective filter is one that passes only a narrow band of optical wavelengths. In their figure 1 they show a colour source image being split (by channel) into images that reflect markers for DNA, cytoplasm, and DNA-damage respectively. It seems to me that careful use of spectrally selective filters would enable this step to be performed more accurately and reliably in the microscope at the image capture stage. It ought to enable many more markers to be used as well. Perhaps I show my biological naïvety here, but filters are commonly used in astronomy, and I’m surprised they don’t get a mention at all.

For working biologists I expect that the practical advice in box 2 will prove invaluable. All sorts of juicy nuggets from using microplates with black sidewalls for laser based autofocussing, to avoiding photographing the edge of a well, to not opening the lab door while a series is being photographed. And avoid JPEG.

On the whole I found this quite a useful introduction, well written, and occasionally fun too. I now have a few more items for my reading list and at least one image processing algorithm to implement.


Global nitrogen deposition and carbon sinks

July 29, 2009

“Global nitrogen deposition and carbon sinks”. Dave S. Reay; Frank Dentener; Pete Smith; John Grace; Richard A. Feely. nature geoscience. 2008-07. I read the PDF that I found by googling for the title and clicking on the first PDF in the results. Pay-walls suck.

A paper about nitrogen’s rôle in the carbon cycle, looking at what we know about how nitrogen influences the major carbon sinks (forests and soil on land, and the ocean). By the way, Nr is reactive nitrogen (they use this symbol, and so shall I).

Probably worth reading if you are starting to think about the secondary effects of emissions. But don’t let it distract from the big picture. We’re burning too much carbon, and it’s not going to help with that.

I am not a climate scientist. But this is not a deep paper, it is mostly an overview; it’s 8 pages long and mostly consists of summarising other works; it references 92 papers (and still misses at least one: they make use of the IPCC SRES scenarios, but fail to reference the IPCC Special Report on Emissions Scenarios). The paper does not discuss the nitrogen cycle at all (despite being a paper about nitrogen deposition). Nor does it discuss other greenhouse gasses apart from CO2; in particular N2O, a greenhouse gas itself, is only discussed as a reactive nitrogen emission (from soil, for example) and its effect on nitrating a CO2 sink. This seems odd. But to incorporate the nitrogen cycle and other greenhouse gasses at the same time would be both potentially confusing, and lead to a much less accessible paper.

It seems very comprehensive. I have not read most of the referenced works (in fact, I’ve read only 1 I think, part of the IPCC Fourth Assessment Report), but they seem to be reasonably summarised, and the paper as a whole covers a lot of ground in its 8 pages. The paper first discusses the current emissions and their likely increase (not everywhere; European reactive nitrogen emissions are likely to decline). The rest of the paper is split between the 3 main carbon sinks: forest, soil, ocean.

The main theme of the paper is uncertainty. Having reviewed the available literature it seems that the effects of reactive nitrogen have been difficult to quantify so far. For example, the effect on carbon sequestration in the boreal forest of reactive nitrogen is summarised as being somewhere between 40 g C per g Nr and 200 g C per g Nr. Quite a wide range.

The bottom line is… yes a bit of extra sequestration in the oceans, some in the boreal forest. And not enough is know about the topical forest. Which is a shame, because it looks like that’s where a lot of the future Nr is going to get dumped. Overall, the extra sequestration will be noticeable (amounting to not more than 3 billion tonnes CO2 per year), but not really enough to have any useful effect. And a lot of that useful effect is negated by the greenhouse gas emissions themselves.

Niggles relegated to an appendix

Mostly the text uses Petagrammes (Pg), “emissions reached 7.2 Pg of carbon per year” that sort of thing, but then the diagrams, borrowed from the IPCC, use Gigatonnes (Gt). These are actually the same unit. It would be better to choose one unit and stick to it.

On two pages we see global maps comparing the distribution and strength of nitrogen deposition. Over land and oceans on page 432, and over the ocean on page 434. There are several problems with these maps, mostly in the inconsistent presentation. The first set of maps, page 432, shows the current (year 2000) nitrogen deposition and two different projections for the year 2030. The second set, page 434, shows the pre-industrial distribution of nitrogen deposition, current (1990s) deposition, and a project under the SRES A1FI scenario. One set of maps puts the prime meridian in the centre and goes from -180 to +180, the other set puts the prime meridian at the left and goes from 0 to +360. Latitudes are marked “60ºN, 30ºN, EQ” on one set, and “90ºN, 45ºN, 0ºN” on another. They are different sizes. One set of maps uses g N m-2 yr-1 the other uses mg N m-2 yr-1 (then, later in the text, kg Nr ha-1 yr-1; garhh!). The scales are different. The number and selection of colours is different. One set is labelled “Global distribution of total Nr deposition”, the other “Global distribution of oceanic nitrogen deposition”. Nr is the symbol they introduce for reactive nitrogen. Are the oceanic depositions reactive nitrogen? Probably, but they fail to say so it introduces ambiguity.

I know why the maps are like that. It’s because they were borrowed from other papers. But this points to a problem in the scientific community. It should be easy to take visualisations from different sources and massage them into a consistent presentation format. The fact that it’s obviously not easy is bad.

Incidentally I note that the maps use the equirectangular projection, they don’t say this, and I never knew its name until I looked it up on Wikipedia for this review. I still find this a strange projection to use, but it does seem to be common in scientific communities.

Oh yeah, and for some reason the band from 45ºS to 90ºS on the oceanic set is strangely squashed.


Lambda: The Ultimate Goto

July 16, 2009

Debunking the “Expensive Procedure Call” Myth; or, Procedure Call Implementations Considered Harmful; or, Lambda: The Ultimate GOTO. Guy Lewis Steele Jr. October 1977. MIT AI Memo 443.
PDF version hosted on this blog.

Essential reading for all computer scientists or those wishing to implement a language. I should’ve read this paper when it was half its age.

It’s 1977 and Steele is writing memos from an alternate reality. It’s a little hard to place oneself in this alternate universe: “Some programmers fear that their expressive power or style will be cramped if GOTO is taken away from them.” Steele does not think this is a good thing, hence the memo, it’s just a sign of how things are in that era. Was this really only 2**5 years ago?

His concerns about the subroutine’s perceived inefficiency are now laughable, and that’s no doubt partly due to Steele’s efforts in this memo. His other concern is the conflict between abstract programming concepts and concrete language constructs. And that concern is still valid.

Steele uses LISP (yes, in capitals!) for his examples with no introduction nor explanation. Cruel. But then, it is a Steele memo, and LISP has been around for 17 years already; it’s no new kid on the block (it has a whole 4 years over PL/I for example).

Part A shows how splitting up the traditional notion of what a procedure call might mean allows them to be implemented efficiently and also used for tail-calls. In other words you don’t need to do tail-calls as a special thing, organise your compiler properly and you’ll give the programmer a tool with which to express tail-calls. This is a good thing, and crops up later on.

He has an amusing rant about the syntax of procedure calls giving them a distinct flavour in most languages from built-in operators. Like the fact you can’t pass a Fortran statement function as an argument, or you have to use «CALL ... USING» in COBOL. This part still rings true. In Python we can’t pass «+» as a function (though we do have «operator.add»); «(2).__mul__» is not the same as «lambda x: 2*x».

In Part E (one of the best bits), Steele shows how Yourdon’s rat’s nest state machine can be transformed from the “traditional” implementation with an explicit state variable and a loop to a “procedural implementation”. Steele considers this “structured” (“structured” as in no GOTOs, a current buzzword of the time), and points out a further benefit: state transitions can pass each other information as parameters to a procedure, rather than using global (shared) variables. I would put it slightly differently: The liveness information the programmer is giving to the compiler is more honest.

This part, implementing a state machine using procedures, forms merely one component of a larger argument. That programming concepts and programming language constructs do not have a one-to-one correspondence. That is, though we have constructs like procedures for encapsulating modularity, and WHILE for iteration, we might use assignment to implement modularity (and yes, he has quite a good example of this), and procedures for iteration. And, he argues further, it is not up to the language implementor to guess what the programmer might do with each language construct. Implementors should give programmers all the reasonable tools they can, “otherwise, programmers will merely misuse the constructs they are given”.

This point is expanded in a note (“Various Optimizations”); there should not be just one way to compile a particular language construct, but compiler writers should “try to determine from a given program the best of all possibly interpretations and produce code accordingly.” Sound, but somewhat glib.