Thursday 1 May 2008

Open Source Sequencing and Colossal Squid

The Polonator has a really cheesy name (Arnie dressed as a giant bee?) , but it's a really important step in molecular biology. It's a DNA sequencing machine, there are lots of these around at the moment as people have managed to make systems that beat the traditional dideoxy Sanger-sequencing method (which was used to sequence the human genome) in speed and in the bulk cost of sequencing. Examples include 454's pyrosequencer and the dubiously capitalised SOLiD system. Both produce substantially shorter reads, single sequences of DNA, than the ABI 3700's used to sequence the human genome (though I've heard rumours that a new 454 machine that can do 500 bp reads is on the cards), but do millions at a time.

If you want to run one of these systems, you purchase them for hundreds of thousands from the manufacturer, get trained how to run them by the manufacturer and buy the consumables from the system from the manufacturer. You don't necessarily know what the reagents are in these consumables and therefore all the trouble-shooting you do involves the manufacturer too. This is because these machines are expensive to make, not many people buy them and they become defunct very quickly. It ties labs to the machine, probably long past the time when they're cutting edge.

In some ways it's silly to keep these reagents secret. I worked for AstraZeneca for a year as an undergraduate (which was great experience - companies are much stricter about lab protocol than universities and you pick up good habits!) and they purchased a machine for typing bacteria that I was involved with validating for use - the RiboPrinter Microbial Characterisation system. We bought their kits and never knew what was in them, but myself and another student, frustrated by the fact that we had to write a report for university that would contain no science at all as we didn't know what was in these kits, sat down and attempted to work out what they contained. Judging by the comments from the sales reps from Dupont who make the system, we were pretty close in our estimation. If two undergraduates can work out what's in the various solutions (which is a long way from making a working machine, but at least would mean we weren't reliant on buying inevitably more expensive proprietary reagents) what's the point in the secrecy anyway?

The Polonator is different in that the system is open source. They give you full specs of the machine, they give you full details of the reagents required: what's in them and the concentrations. You can get the machine and make your own protocols for it and share them with other users. According to the article in Technology Review (via Digg) users are already collaborating to improve the chemistry.

Surely this is how science should be done? The philosophy is that scientists share their work and it's peer-reviewed before grants are rewarded, after papers are written and discussed by the community at conferences and after publication. This should allow everyone to build on each others' work and mean that resources aren't wasted - molecular biological research is not cheap to do. This isn't in reality what happens. Labs have their own protocols for performing experiments that might be shared amongst collaborators, but more rarely with the whole scientific community unless it's a more ground-breaking achievement. Minor improvements to protocols that could save time and money are shared perhaps by word-of-mouth. Negative results are generally not reported, meaning that identical experiments could be performed again, perhaps many times, in another lab, wasting resources again.

There are good reasons why some information shouldn't be shared, perhaps to allow a patent to be taken out on the intellectual property to prevent someone nicking the idea for their own gain. But I believe that there are more benefits in the majority of cases for sharing results. There's not a lot of truly open source science going on out there currently. Another example would be the OpenWetWare project, which aims to make sharing of information and protocols amongst bioscientists easier - to the extent that they encourage lab notebooks to be kept online on their wiki. It takes brave souls to get involved initially, posting up their failed and frustrating experiments along with the successes, but then once more people are involved I reckon the benefits will speak for themselves.

Hopefully more projects like this will be initiated and we can have a truly open, more productive scientific community.

On a completely different topic, a group of scientists at the Museum of New Zealand Te Papa Tongarewa have been investigating three Giant Squid (Architeuthis dux) and one Colossal Squid (Mesonychoteuthis hamiltoni, it's bigger, I reckon they're reserving the name Ginormous Squid for the next bigger one they find) that they've had on ice. The blog is fascinating, with pictures of the biggest eye in nature and some clips of them during the dissection. I like squid, and own a T-shirt to prove it - linky.

Thursday 13 March 2008

Venter at TED

TED2008 finished at the beginning of the month and there's been a flurry of new videos of the talks (generally about 15 minutes in length) to the site. One of the new talks is by Craig Venter, I've posted before about his last TED talk here, and like last time I find his new talk fascinating, but it leaves me in two minds.

The science is very impressive and his delivery is understated so that when he says things like "replacing the entire petrochemical industry" or "producing fuel from sequestered CO2 within 18 months" it makes it seem more so. On the other hand, does he actually answer the question at the end about using the technology he's creating to produce bioweapons? He gives reasons why it is currently unlikely, but is it actually impossible? That's probably unreasonable of me - you can use all sorts of technologies to make weapons that are also used to produce entirely innocuous and useful products.

And the product he's talking about here is to divert CO2 waste-streams from manufacturing and make fuel directly from them using synthetically constructed microorganisms. Though this doesn't actually remove CO2 from the atmosphere, you remove the emission - perhaps permanently if when you burn this new fuel you recycle that CO2 as well. Climate change undeniably needs radical solutions and this certainly is one.

I'm trying to look past the presentation to the heart of his talk and find out what he's actually saying and what I think about it, meanwhile see for yourself below.

Wednesday 16 January 2008

In Praise of the Pedestrian

Happy New Year, my posting lapse was because things got busy before Christmas. They aren't really less busy now, but I'm attempting to be more organised.

Ray Bradbury is my favourite author. He's had some competition from Neil Gaiman and, more recently Philip Reeve, but he still retains the top spot. My favourite novel is Fahrenheit 451, which Bradbury based on a short story called The Pedestrian. What Wikipedia fails to record is Bradbury's anecdote on the incident that inspired the original story - I am recounting this from memory as it's from an introduction to a previous version of Fahrenheit 451 that I no longer have.

The author was out on a late night stroll along the street, perhaps enjoying the clemency of the weather or a view to the stars, when a police patrol car pulled up beside him. The policeman shouted across to him, asking what he was doing. "Walking" was the reply. Not believing that there could ever be a reason that wasn't nefarious for simply walking and could not envisage that any benefit could be gained from the activity, the policeman continued to question him. Bradbury was eventually able to convince them not to arrest him and headed home, probably indignantly, to compose The Pedestrian.

Currently I am doing some work with an emeritus Professor in the department, Ted Maden. He has retired, but still has an office and carries on researching things that take his interest, albeit without funding or a salary. He has discovered a number of pseudogenes in the human genome sequence, these are remnant genes that no longer function, but can tell you something of the evolutionary history of the active version of the gene.

Ted learnt his molecular biology before computers were used to analyse DNA and protein sequences, and certainly before whole genomes of any organism were sequenced, I am helping him by obtaining further sequences from genome sequences as I have some familiarity with how to extract these from databases.

One of the first steps for comparing different versions of a gene is to produce an alignment where you line up in columns all the base pairs along the length of the DNA sequence against the equivalent positions in the other versions. Sometimes you will have to add gaps to the sequence to compensate where base pairs are missing or additional base pairs have been added. Sometimes sections of the gene have evolved quickly and changed so much that you can't really align two sequences against each other with any certainty and have to make informed judgements as to which base should go where. There are computer programs that can do this relatively quickly such as ClustalX or ARB, they can handle a large number of sequences at once and generally do a good job of aligning, though I always check and edit alignments by hand.

Ted produces all his alignments by hand on ruled sheets of A3 paper, writing out each of the thousands of bases, checking and double-checking the sequence and the alignment. The wadges of completed alignment are held together by bulldog-clips and stored in map-drawers. He then annotates them and collates details of mutations that have occurred between the different sequences in the alignments, noting the locations, types and frequencies.

All the things that Ted does are possible using modern bioinformatics. I could apply an algorithm to count CpG mutations and locate restriction sites, but what I can never do is obtain the deep understanding of the sequences that Ted has gained by aligning things his way. He spots sequence features that I wouldn't know to screen for, corrects sections of my alignments that have completely foxed the computer alignment algorithm. There is simply no substitute for quality.

Sequencing technology is getting faster and cheaper. Complete genome sequences are published with increasing regularity. The NCBI currently lists 623 complete microbial genomes with 915 in progress link. The first microbial genome sequence to be released (excepting some small viral genomes) was that of Haemophilus influenzae in July 1995 and it took many months to produce. Now it could be resequenced in a time-scale more appropriately measured in hours. However, annotation of the resulting sequence, identifying the genes found, has not increased at a similar rate to our sequencing ability and our understanding of how these organisms work based on the sequence generated has increased at a slower rate still. When a new bacterial genome is sequenced the number of genes that are unidentifiable is significant, I've heard an average of 30 % of these genes on a new genome can actually be identified, the rest are marked as hypothetical.

My point is that we are getting faster and faster at producing larger and larger amounts of data, but not making similar gains in understanding the data we're producing. Ted's approaches might seem preposterous when compared against work comparing multiple entire genomes to one another, but they can only skim the surface of the information that's there.

In The Pedestrian the protagonist ends up being taken to the Psychiatric Center for Research on Regressive Tendencies. Ted's hoping to publish our findings, hopefully his methodology will be received well. Otherwise next post from Bedlam.

Tuesday 23 October 2007

Excuse me Sir? May I see your papers?

It is approaching that time again. My life is characterised by two years of calm followed by a year of building tension thinking "what the hell am I going to do next?". My contract at Liverpool finishes in March and I'm starting to look for jobs.

My first plan was to apply for NERC and Royal Society fellowships. The advantage of these schemes is that they allow you the autonomy to research what you want (in line with original proposal, otherwise the funding bodies might get grumpy) and once you have fellowship status you can apply for grants as principle investigator. Also, universities are strongly encouraged to make research fellows permanent members of staff after the end of the funding, though this is by no means guaranteed. The downside is that they are very competitive, the NERC fellowship covering all the natural environment disciplines, from counting birds and geology to molecular biology and the RS fellowship open to all scientific disciplines. Applications therefore have to be exquisitely constructed things of intricate and arse-covering beauty.

I had an idea for a proposal, I think it was a good one, and I had people who wanted to collaborate and also thought that it would work or at least be interesting. However, I then sat and thought about my publication record.

As a postdoc, papers are everything. They are the only currency that employers and funders understand, they define your career and symbolise your scientific ability. Whether this should be the case, or they are a fair representation is moot. You have to have publications to prove that you are worth employing and that is that. Speaking to an academic who had been on one of the NERC fellowship review panels he reckoned that he'd hope to see three, first-author papers published in a good quality microbial ecology journal for a NERC fellowship applicant in microbial ecology. 'Good quality journal' gets us into the realm of impact factors. Journals are basically judged by the number of times that the papers in them are cited in other papers. It's like climbing up Google's hit list, but harder and with less reward.

The other critical point is 'first-author paper'. There are two positions you want to be in the list of authors on a biology paper (I'm not sure that this necessarily holds for other disciplines), first, or last. First broadly means that you wrote it and contributed significantly to the scientific content. Last means that you concocted the proposal, got the funding, supervised the work and probably edited the paper. Those positions in between have a sort of sliding scale of worth with slap-bang in the middle being the bottom of the pecking order. Experiments are never as straight-forward as 'Bob did the work and Alice' got it funded so there can be jockeying for position and everything gets political, particularly if more than one group was involved in the research.

Which brings me to my own publication record. Brace yourselves, this is not pretty; it is a Frankensteinian creation of little consistency. I start at the very beginning (I'm told it's a very good place to start):

1. Isolation of viruses responsible for the demise of an Emiliania huxleyi bloom in the English Channel.
Wilson, W. H., Tarran, G. A., Schroeder, D., Cox, M., Oke, J., and Malin, G.
Journal of the Marine Biological Association of the UK, 2002.
PDF

2. Aminobacter ciceronei sp. nov. and Aminobacter lissarensis sp. nov., isolated from various terrestrial environments.
McDonald, I. R., Kampfer, P., Topp, E., Warner, K. L., Cox, M. J., Connell Hancock, T. L., Miller, L. G., Larkin, M. J., Ducrocq, V., Coulter, C., Harper, D. B., Murrell, J. C., and Oremland, R. S.
International Journal of Systematic and Evolutionary Microbiology 55; pp 1827-1832; 2005. Abstract.

3. Use of DNA-stable isotope probing and functional gene probes to investigate the diversity of methyl chloride-utilizing bacteria in soil.
Borodina, E., Cox, M. J., McDonald, I. R. & Murrell, J. C.
Environmental Microbiology vol 7; pp 1318-1328; 2005
Abstract.

4. Stable-isotope probing implicates Methylophaga spp. and novel Gammaproteobacteria in marine methanol and methylamine metabolism.
Neufeld, J. D., Schafer, H., Cox, M. J., Boden, R., McDonald, I. R., and Murrell, J. C.
The ISME Journal vol 1; pp 480-491; 2007
Abstract.

5. A multi-loci characterization scheme for Shiga-1 toxin encoding bacteriophages.
Smith, D. L., Wareing, B. M., Fogg, P. C. M., Riley, L. M., Spencer, M., Cox, M. J., Saunders, J. R., McCarthy, A. J., and Allison, H. E.
Applied and Environmental Microbiology pp (accepted 08/10/2007)
Abstract.

Those are the ones that are out at the moment, there is one more that is currently going through the submission process that I won't jinx by listing and three more that are in various stages of writing that should, hopefully, all be published at some point.

Now, what's the first thing you notice about those four beauties? Firstly, where are the first and last authorships? In a couple of those papers I do an excellent job of squatting in the perfect centre, the nadir of the author list. I should have at least one first author paper from my PhD, and this is currently in production, but the delay in writing (purely my own fault) has meant that I have a gap between 2005 and 2007. Not good.

Secondly, what's the theme here? Is it immediately apparent that I am a focussed researcher with a consistent and driven career plan in molecular marine microbial ecology? Is it buggery. I start with marine algal viruses (the work I contributed to this paper was actually done as an undergraduate during a summer project in 1999), move on to a couple of soil bacteria, have a mooch about in stable-isotope probing (one soil, one marine, both with work from my PhD) and then leap gleefully into shiga-toxigenic phage (these are the ones that make E.coli 0157 nasty).

Looking at the positive, they are all in quite good journals. Delving into the innards of the Journal Citation Reports and coming out smelling of dust and politics, it is possible to discover that Environmental Microbiology is doing the best with an impact factor of 4.630, Applied and Environmental Microbiology is at 3.532, IJSEM is at 2.662 and Journal of the MBA UK brings up the rear with a factor of 0.778 (though please don't judge it harshly, it's a cute little journal from one of the oldest marine biology societies in the world - I say one of as I think it might be the oldest, but can't find the proof). The ISME Journal was only out this year and you need three years of article count data before an impact factor can be calculated.

So, considering the above and the fact that the deadline for fellowship submissions is 1st November I decided to wait a year before submitting an application. With no first-author papers, no chance. Which leaves me taking remedial action to boost what publications I've got in order to be able to make convincing applications for a second postdoctoral position. First job, that knotty paper from my PhD thesis that has been dogging me (and I have been dogged about) for some time.

Tuesday 9 October 2007

Can't make up my mind about Venter

Craig Venter causes me problems. Do I like what he does or not? He is a controversial figure; controversy that I think mainly stems from his intention to patent the human genome sequence produced by his company Celera. In a couple of weeks, on Oct 25th, he is releasing his autobiography A Life Decoded: My Genome: My Life (suffering from the sort of dreadful title long associated with scientists autobiographies see here) of which two extracts have been published in the Guardian (extract 1, extract 2), the first about the race between the two human genome projects and the second about his time in Vietnam. Coincidentally, this is also the predicted date for his creation of synthetic life. How's that for advertising?

Working in microbial ecology I had heard of Venter and the race to produce the first human genome sequence - incidentally, neither the Celera or Human Genome Consortium versions are actually complete, the assemblies of the separate bits of sequence change relatively regularly and there are repetitive tracts that may be impossible to correctly sequence and assemble (we're currently up to version 36.2 according to the NCBI) - but my research was in an entirely different area of biology. Then came Sorcerer II and attempt to sequence the sea, or at least all the bacteria in it that pass through a 0.8 micron filter, but not a 0.1 micron one. The papers containing the detail of the expedition and some of the initial findings were published in the journal PLoS Biology. This produced a vast dataset of marine bacterial DNA sequence, massively increasing the amount of DNA from these organisms available in the DNA databases. Indeed they had to set up their own database to manage the data (and wouldn't have been very popular had they not).

This is where I start to have problems. The data is an excellent resource for people to see whether their favourite gene is present in the samples, but isn't the work bad science? There is no hypothesis being tested by sequencing in this way other than "we can sequence marine bacteria" (which reminds me of my favourite scientific paper - An Account of a Very Odd Monstrous Calf, by Robert Boyle pdf
). On the other hand if you have the money and resources to do this kind of thing, why shouldn't you? It is an expedition rather than an experiment. See? Can't make up my mind. There is a presentation from Venter (on his yacht in a typically tropical part of the trip) available here, please ignore the Roche advert and note that even famous scientists can get a bad case of the "this next slide shows".

I think my problems boil down to motivation. Why does Venter want to sequence and patent the human genome? Why was one of the genomes of his own? And more recently why did the project after that have the aim of creating synthetic life? Is it massive hubris or is that entirely unfair?

Still - his competition to be the first to sequence the human genome certainly accelerated both projects and I find the global ocean sequencing data quite handy, plus he is talking about application of a synthetic bacterium, of which his Mycoplasma laboratorium is likely to be the first (as I understand it, currently there is a synthetic genome that has yet to be stuck in a cell and only then does it become an organism), in removal of atmospheric CO2 (echoes of Lovelock's call for direct action there).

I'm still undecided, but in case you want to see more of him, below is a TED talk covering some of his efforts. It was given in 2005 and predicted and synthetic bacterium in 2007 followed by a synthetic eukaryote by 2015. One down, one to go.



Incidentally, the TED talks site is a great place to find fascinating talks - my favourite that I've listened to so far being Sir Ken Robinson - his description of academics on the dance-floor is entirely accurate (although I'd add more pogoing).

Thursday 27 September 2007

Pipe dreams?

I noticed in my Nature alert today a short letter from James Lovelock and Chris Rapley about a potential mechanism for secreting atmospheric CO2 in the seas. The plan described is to have large pipes (and they mean large - 100-200 m in length and 10 m in diameter!) floating in the sea vertically, allowing mixing of seawater above and below the thermocline. Phytoplankton (algae) require three main things in order to grow, light, nutrients and CO2. The thermocline is a divide between the sun warmed surface water, which has sufficient light and CO2 for growth, but few nutrients, and the cool, deeper water which is obviously too dark, but full of nutrients that have fallen out of the surface layers. There are natural places where the two waters mix called upwellings - here you get blooms of phytoplankton which use up CO2 as they photosynthesise. The pipes would simulate these upwellings and stimulate the uptake of CO2 by the phytoplankton. Phytoplankton also produce dimethyl sulphide (DMS - the characteristic "smell of the sea") which can stimulate the formation of clouds - cooling the planet by preventing the suns rays reaching the surface. The story has also been picked up by the BBC here (and the New Scientist and umpteen other places - Nature is quite good at getting its stories published elsewhere before they have themselves! My girlfriend used to tell me all the latest research news from the fee paper Metro before I'd even received the original articles) and has some nice diagrams of the pipes in action.

Atmocean, a US company have been developing such a system themselves and have a suitably dramatic video on youtube (note the use of Clubbed to Death by Rob D, as heard on the soundtrack of The Matrix - it signifies 'bad stuff happening').



It is great that someone is proposing some direct action with some science behind it. The idea is an interesting one, but probably won't work. Bold statement huh? I'll try and justify it.

There are fairly large practical difficulties such as the fact that the Atmocean CEO Phil Kithil in an interview as part of the BBC article says that "134 million pipes could potentially sequester about one-third of the carbon dioxide produced by human activities each year" - it's not clear whether this refers to current levels or not as by the time pipes were in place you'd need a lot more as we'll have chucked a whole lot more into the atmosphere, but anyway, installing that many pipes would be quite a task.

Creating phytoplankton blooms doesn't only increase the level of DMS in the atmosphere above the bloom, it also increases the levels of other gases, including methyl bromide and isoprene (see this paper) . Methyl bromide is a major source of bromide to the upper atmosphere and bromide is better at destroying ozone than the chlorine from CFCs, so this would have the potential to enhance ozone layer destruction. Isoprene has a complex, and not completely understood role in atmospheric chemistry - increasing the levels of these compounds in the atmosphere is likely to have an effect, which may be beneficial or otherwise, but as the roles of these compounds in the atmosphere are less well understood than DMS is stimulating their production wise?

There is also likely to be a massive impact on the biology of the sea; organisms have complex and subtle interrelationships that we are barely starting to understand, particularly at the microbiological level - how this will effect the climate is also unknown.

Charles Rapley points out in the BBC article that his and Lovelock's letter is designed to stimulate discussion about direct action, which I hope it will - international agreements like Kyoto being mired in political torpor, but there is a danger that such dramatic suggestions, that to non-scientists could sound like science fiction madness, only add weight to climate change apathy. It is the style of the solutions that captures the imagination rather than the problem itself (think giant solar reflectors in space and dumping iron filings in the sea).

However, if anyone can truly stimulate action it is Lovelock - his invention of the electron capture detector a sensitive device for measuring tiny amounts of chemicals in the atmosphere, prompted Rachel Carson's novel The Silent Spring, ultimately leading to the genesis of the entire green movement. For further reading James Lovelock's website is here.

Friday 14 September 2007

SGM Edinburgh Talk


SGM 6/9/07


From: mikeyj, 1 week ago





Presentation given at SGM Metagenomics Hot-topic Session in Edinburgh, UK


Link: SlideShare Link