L.
Van Warren
special student
University
of Arkansas for Medical Sciences
Department of Biochemistry & Molecular BIology
August 23, 1999
Fig
1 - Sensory Rhodopsin from Swiss Model Database, visualized using Chem
3D Pro.
|
Time Estimate
Rhodopsin structure construction time was estimated using my Substance P analysis as baseline, assuming a linear increase in complexity.
Table
1 - Substance P vs. Rhodopsin Comparative Processing Estimate
|
|
Substance
P (11 residues long)
|
Rhodopsin
(~341 residues long)
|
|
|
The escalation of complexity was therefore a factor of 31. This estimate was based on performing the condensation by hand. This preliminary analysis also neglected the important issue of correct protein folding.
This section is large and detailed set of query results on a separate set of linked pages. A single analog was selected. Homo Sapiens rhodopsin sequenced by J. Nathans and retrieved for this work from the Entrez database listed in the linked pages.
In the previous example for Substance P, the units were assembled by manually pasting each amino acid into a graphical chemistry editor and performing the condensations by hand, gradually escalating in complexity:
Fig
2 - Sensory Rhodopsin from Swiss Model Database, hand condensed
using ChemDraw.
|
The resulting structure was then translated into SMILES format using the "Copy As SMILES" option of CambridgeSoft Chemdraw and then sent to the Corina structure server used in the Substance P project. This worked up to the point where the number of residues exceeded the capacity of the Corina server, which by inference, is between the 60 and 90 residue cases.
Fig 3- Hand Condensed Rhodopsin Residues 1-10 |
Fig
4- Hand Condensed Rhodopsin Residues 1-60
|
Fig
5- Hand Condensed Rhodopsin Residues 1-90
|
These cases were later reprocessed using the Chem3D product from CambrideSoft to produce the ".mol" format files shown above.
Screen cluttered became difficult over 300 amino acids, waiting for their turn to assemble into the condensed structure.
Amino Acid Quick Lookup TableA Linguistic Aside
There are about as many amino acids as there are letters in the English alphabet. Unlike their Asian counterparts, English alphabetic characters have very little picture carrying capability - they don't look like anything. Consider a list of amino acids named TableA. If we take all pairs of amino acids, corresponding to the Cartesian product of TableZ with itself, there are 20 x 20 combinations, or 400 entries in the table. We name this TableZZ where the ZZ can stand for any pair of amino acids. If we take all triplets, there are 8000 entries in a table named, you guessed it, TableZZZ.Table Z, with 400 pairs is about one fourth the size of the simplied Japanese Kanji Alphabet. TableZZZ is about four times as complex as Kanji. Kanji, unlike English is an alphabet with per character picture carrying capability. Amino acids linguistically resemble their English alphabetic counterparts and lack the ability to convey significant two dimensional image structure on a per residue basis. Thus residues are a character oriented programming language, a language that when executed by the cell results in a living organism. This character language when executed on the small scale results in the transcription of proteins. When executed on the large scale the consequence of construction of biological organisms complete with organ to organ "jumping gene" hormone signaling.
If we use entries in TableZZ instead of the standard abbreviations we can transmit the 340 amino acid Rhodopsin molecule using only 170 symbols, but we have to draw from a 400 symbol alphabet. There is some overhead associated with this. To use a communication theory analogy, the first time we string our tin cups together we have to transmit 570 symbols, plus a little overhead for things like, "here is how to read what I send you". Subsequent transmissions of even numbered Rhodopsins (or anything else) only require 170 symbols. There is a slight inconvenience also. If we wanted to transmit a 341 amino acid Rhodopsin we would have to use an alphabet drawn from TableZZ and TableZ, since there would be one lone straggler who would not be in TableZZ. Thus we would have to draw from a 420 symbol alphabet, the concatenation of symbols listed in TableZZ and TableZ. We can apply this argument recursively, run length encoding the amino acids and looking for their corresponding entries in tables of consecutively increasing size. Just as combinatorial explosion is about to defeat us, we might speculate that at some Table complexity, say Table32Z (32 Z's in a row, but 32Z is shorter!) , the table becomes sparse, because not all entries of length 32 are used in the assembly of creature nation.
There are two bits of reckoning in the previous argument. First that 32 is the magic number where the sparsity of the table allows it to be stored in more compact form. The second reckoning is that nature doesn't use every possible combination. Of course I can't prove that, but it seems likely enough to serve as a good working fiction. End Aside.
This is a useful tool for hand condensation, but is too large to fit here.
This page, too large to include here, asks why condensation is so reliably constrained to the primary Nitrogen when other sites would appear to be available.
Acknowledgments
It was suggested by Dr. Steven Fliesler that the techniques developed for Substance P could be applied to Rhodopsin. He also urged me to take advantage of NMR work being applied to this problem, an aspect I have not yet been able to get to.
The Corina 3D structure service continues to ease the burden of a difficult task nearly effortless. Thanks to MDL for developing the Chime plug-in for 3D viewing in Netscape and Explorer. Thanks to Eric Martz for clarifying Chime installation procedures on the server side. Dr. Wolf-D. Ihlenfeldt Computer Chemistry Center, University of Erlangen-Nuernberg Erlangen (Germany) developed the service that makes any of this possible. A special thanks also to Mark Turpin whose web sponsorship continues to make this possible.