NYCOMPS uses bioinformatics to select genomes and determine valid targets to be processed in the pipeline at the NYSBC.
Reagent genomes
The NYCOMPS reagent genomes are 90 prokaryotic genomes, bacterial and archaeal, from which we clone. Currently, we are expanding the number of reagent genomes. The new genomes are chosen based on similarity to human transmembrane proteins. Further, they are a good complement to the current reagent genomes.

NYCOMPS reagent genomes cast onto the tree of life. (generated by iTOL: I Letunic & P Bork (2007) Bioinformatics 23:127-8)
NYCOMPS98 dataset
The NYCOMPS98 dataset is the set of sequences derived from the reagent genomes. We predict transmembrane alpha-helices for all sequences. After several more selection steps, for example reduction at 98% sequence identity and exclusion of sequences with long stretches of disorder, we have generated the NYCOMPS98 dataset. These sequences are processed in the pipeline.
Target selection
For targets, alpha-helical transmembrane proteins of interest, all similar sequences in the NYCOMPS98 dataset are searched. These are sent to the pipeline for cloning and expression. An overview can be seen in the following flowchart:
The first large set of targets was based on a list of E. coli proteins. These proteins are predicted transmembrane proteins that have already been successfully expressed in the lab of Gunnar von Heijne (Daley et al. (2005) Science).The second large set of targets are predicted human membrane proteins. Furthermore, targets can be nominated by experimental groups associated with NYCOMPS.
