Proteomics and other musings: November 2013

Friday, November 15, 2013

You gotta go for what you know, Make everybody see, in order to fight the power that be

So I get this question a lot.

How many replicates do I need for a proteomic profiling experiment?

http://www.real-statistics.com/wp-content/uploads/2012/11/statistical-power-chart.png

It's a great question and one unfortunately I have no idea how to answer with any accuracy.

I remember reading in a book somewhere (wish I remembered where) where the author stated that whenever he was asked this question he would say 30 biological replicates! Well that's great if you can actually generate 30 replicates and then pay for them to be analyzed....But alas in the world of proteomics, this is usually unobtainable.

So what to do....

Let me start by explaining why I have no idea how to answer this question. It all has to do with Power and how it is calculated

Power is in essence your ability to detect a change in your sample when such a change actually exists. In other words what is your chance of committing a type II error (A false negative)

This is one of the best explanations of Power calculations I have seen (it's worth reading)

http://www.refsmmat.com/statistics/power.html

Some Bullet points

Power depends on the number or replicates you do, your variation in your sample and the magnitude of the of the change you are trying to detect.

So the more replicates you analyze, the higher your chances of detecting a low magnitude change that has a large variation .

Now this can be calculated in a relatively straightforward way if you say have a small number of variables you are interested in and know your variation. I think anyway....

Now is that what we have in a proteomics or RNA-Seq experiment?

Unfortunately no

We have thousands of variables! And to make matters worse each one of those variables have their own variation and magnitude.

And it gets even more complicated with bottom up proteomics. Are you defining your variables as peptides or proteins? Ultimately we are interested in the the proteins (most of the time). But we need to reconstruct those proteins from the peptides we identify. Each of those peptides may have a different variation, (S/N) , and magnitude of change, across samples. How do we calculate power on a protein level....I have no idea.

For example

If you have a peptide that has a S/N of 1000:1 and a %CV of 5%, then you would need less replicates to detect a change in that peptide (depending on the magnitude of the change) than a peptide with a S/N of 2:1 with a %CV of 50%. Now lets say both of those peptides map back to the same protein....How do you calculate the power of that? This can actually be quite common as peptides can vary in their ionization by 4-5 orders of magnitude maybe (I need to look up he cite here) but I think that;s a reasonable guess and their ability to be cut by trypsin can vary by a lot as well (proteotypic vs non-proteotypic)

There has been a few papers on calculating power of proteomics experiments over the years, this one being one of the better ones.

The role of statistical power analysis in quantitative proteomics

If you look at this figure From the above paper

You can see a larger version of this figure in the paper, but essentially if you can somehow know that your protein has a 50% variation between groups (how you calculate this I have no idea...) at a power of 0.8 (one in five chance of detecting it) you would need 15 biological replicates per group to detect a 1.5 fold change. If you have a 25% variation you would only need 5 replicates. The good news is if you have a 100% variation you may need only 5 replicates to detect a 3 fold change. I think that's probably realistic...

Does the protein you are interested in have a 50% variation, 25% variation, 5% variation? I really have no idea. Most likely in your 1000+ plus proteins we can identify you will have some with a low variation (usually the high abundant and less interesting ones) and some with a large variation (less abundant more interesting ones) .

One can only get a handle on their variation after the experiment is run. You can begin to calculate your %CV's and S/N (1/CV) but I don't know how to do this before the experiment is run.

So more replicates the better, less variation the better. But it's still possible that you will need that 30+ replicates to see that < 1 fold change.....

Wednesday, November 13, 2013

Nano-LC isn't easy but it's necessary so I’m chasing peaks like Tom chases Jerry

Man if I had a quarter for every time my Nano-LC system Failed. I mean it’s broken more than it works. And how many do I have 5 of them?

Instead of going all Office Space on my Nano-LC’s , I thought I’d praise the virtues of it and why you need to just bite the bullet, and live with it.

Everyone who has done it knows how painful it is, and if it isn’t painful, you are either not pushing it to its limits or maybe just injecting clean standards and not real dirty biological samples. If Nano-LC wasn't an issue, you would not have companies touting their Easy-LC (which is anything but Easy), their chip based systems or their direct spray systems like Bruker’s acquisition (and ultimate destruction) of Michrom’s Captive spray system (RIP). I’ve tried all of them and they generally are not worth it.

Let’s get to the virtues of nano-LC first.

You really get a sense of accomplishment whenever it works.
You can Make most of the parts yourself (and you really should)
It really is more sensitive than anything else.
If you don’t take labor into account it’s way cheaper.
Job security, as it does really take a lot of practice/skill to get really good at it.

Some of the Downsides

Making the parts and troubleshooting can get very expensive if you take labor into account
Very easy to overload your columns.
The autosamplers on Nano-LC systems generally suck. I still know people who do not use them and bomb load everything
When it is working, you should not talk about it or stare at it intently. The Nano-LC system will know and break on you.
Reproducibility can be poor during long runs of samples as the column and spray tip gradually deteriorates.

I’m generally skeptical when people say they run their nano-spray system for months nonstop on real biological samples. I guess if you are not too concerned with RT and sensitivity reproducibility I guess this is possible,. But if you say run standards every 5-10 runs, you’ll see how much things can go downhill fast. Remember every time you are injecting a sample you are changing your column. That carryover you see is because you are not recovering 100% of your sample from your LC system . And that carry over changes your system in subtle and not so subtle ways.

Here are some tips I have come up with over the years

Making your own columns, traps and packing them is not that hard. I’ll Put up a tutorial some day. If you buy a nano-column from a company, they are most likely using a Sutter Laser puller and packing using a pressure vessel, just like you can do.
Most people use a Vented column set up with a Tee or a Cross (for the HV)

Watch out for those clogging, they are a major point of failure.

Main points of failure on your LC system (depending on how they work)

Rotors (rotor seals mainly)
Piston Seals
Check Valves
Tubing getting clogged

Which is why I do not recommend PeekSIl

Fittings not connected correctly and introducing dead-volume

Just remember, it gets easier if you just getting started. and don’t be tempted with these all in one nano-solutions you see for sale. They will ultimately fail in some way and you will have a hard time fixing the problem yourself if you do not know how it works. When you have samples piling up, it’s far better if you can fix it yourself. Of course there is always going to be real hardware failures you can’t fix, but most of the routine problems you can