11/15/99
Dr. Steven C. Elbein
Genetic Biochemistry
L. Van Warren

Lecture Notes:
Identifying
Human Disease
Genes

1:2:1

 

I. Identifying human disease genes

A. Candidate gene approach

1. Nonpositional approach

A candidate gene may be suggested without any INITIAL knowledge of chromosome location if the manifested trait resembles another already known trait in humans or animals. This is becoming less prominent in the wake of the Human Genome Project.

2. Positional candidate genes

Once a disease has been mapped, it is possible to use databases searches to identify candidate genes. As the human gene map becomes more and more detailed, particularly the human morbidity map, this approach appears ready to take over.

3. Functional approach

a. Isolate protein and clone gene of interest

In this approach one would isolate the protein that contributes to the disease state. After isolating the protein it could be sequenced directly to determine the peptide sequence. This peptide sequence could then be used to generate the nucleotide sequences that would have been translated to produce the protein. Two different nucleotide sequences can code for the same protein. Thus, nucleotide sequence is not unique since of the 64 possible nucleotide triplets, only 20 codons are used. So with the peptide sequence known, one could generate the possible nucleotide sequences on the fly. Because of combinatorial explosion and intermediate expression swell, a very large number of possible nucleotide sequences would have to be compared. However, using on the fly search technique would allow only promising candidate genome regions to be screened for the protein of interest and greatly eases the computational burden in comparing sequences.

(1) Hybridization
(2) PCR screening from oligonucleotides

b. Clone from gene expression
c. Select genes from pathway or gene family
d. Complementation in yeast

B. Positional cloning

1. No known candidate genes
2. Loss of heterozygosity
3. Chromosomal abnormalities and breakpoints
4. Linkage analysis

a. Genetic maps and families
b. Physical map
c. Localizing a disease gene

(1) Crossovers
(2) Linkage disequilibrium

5. Cloning by animal models

C. Complications in finding Mendelian disease genes

1. Locus heterogeneity

a. Multiple genes may cause the same disease phenotype
b. Diseases are usually unlinked (different chromosomes)

2. Allelic heterogeneity

a. Multiple different mutations may cause the same disease

D. Detecting mutations in candidate genes

1. Types of human mutations

a. Missense mutations

A missense mutation that alters the codon to that of another amino acid, causing an altered translation product to be made.
"Missense alters for better or for worse."

b. Deletions and insertions
c. Nonsense mutations

A nonsense mutation alters a codon so that it codes for a stop signal, as in amber (UAG), ochre (UAA), or opal(UGA). This prevents the protein from being completely formed.
"Stopping is nonsense."

d. Chromosomal rearrangements
e. Triplet repeat diseases

Huntington's disease (HD) is a good example of a triplet repeat disease with an increase in triplet repeats. In normal population there are 11 to 34 CAG trinucleotide repeats in the proximal portion of the 500 kB segment between D4S180 and D2S182 on chromosome 4p16.3. In HD there is an increase to 39 to 66 copies[Motulsky]. A second example is myotonic dystrophy, the most common type of adult muscular dystrophy. The remarkable finding is the correlation between the size of the trinucleotide repeat and both severity and onset. Normal individuals average 5 copies of a CTG repeat. Minimally affected individuals had about 50 copies. Several affected individuals had 1000 copies. [Klug] Some authors comment that the length of such repeats may be responsible for fine tuning of transcription and that repeat diseases are frequently nervous system disorders.

2. Methods of mutation detection

a. SSCP: Single strand conformation polymorphism.

Single strand conformation polymorphism SSCP is a technique for detecting known and unknown mutations in PCR products of DNA and RNA.

SSCP AND mutation is a very informative search.

BRCA1 in cancer cases.
BRCA2 in cancer cases.

b. DGGE: Denaturing Gradient Gel Electrophoresis

The common methods for detecting single base changes in both cloned and genomic DNA

c. Heteroduplex mismatch screening

Heteroduplex is double-stranded DNA in which the two DNA strands do not show perfect base complementarity.

d. Chemical cleavage and enzymatic cleavage
e. Protein truncation test
f. Sequencing and resequencing

E. Genetic testing

1. Indirect testing by linkage

a. Utility depend on other affected family members
b. Recombination limits detection
c. Requires highly polymorphic markers near the gene

(1) Flanking markers improve prediction

d. Tests of identity: DNA fingerprint analysis

2. Direct testing by gene screening
a. Sample types
b. Scanning for unknown mutations

(1) Necessary in many genes where no single mutation is prominent
(2) Same methods as mutation detection

c. Testing for known mutations

(1) Presumes little locus heterogeneity

(a) Several common mutations or a single type of mutation
(b) Triplet repeat diseases, CFTR, hemoglobinopathies

(2) Methods depend on detection of known mutation at single location
(3) insertion:deletion mutations

(a) gel based detection

(4) Single nucleotide polymorphisms

(a) DNA chips
(b) Heteroduplex mismatch

i) limited ability to detect homozygous
mutations

(c) PCR-RFLP
(d) ASO
(e) Allele-specific PCR primers

i) OLA on PCR product
ii) ligase chain reaction on DNA product

d. Who should be screened for disease genes?

II. Complex diseases

A. What is a complex disease?

Hallmarks of complex disease are:

a) polygenic

polygenic - many genes participate in disease, i.e. > 1 locus involved
oligogenic - a few genes involved
monogenic - one gene involved
Cystic Fibrosis (CF) with its single locus represents an exception.

b) environmental
c) nonMendelian.

Complex diseases will combine sums and products of multiple interacting loci.

1. No apparent Mendelian inheritance
2. Multiple interacting loci

a. Epistasis (multiplicative model)

In a model where one gene affects the expression of others, the effects are multiplicative.

b. Heterogeneity (additive model)

In a model where you have multiple interacting loci an additive model applies. Consider Lang's colon cancer example where you have a complement of multiple slow and rapid cyclers controlling tumor suppression and carcinogenesis. Another example of this is parallel pathways, all pathways must be blocked before disease is seen.

locus heterogeneity, p65, Mutations in several genes result in the same clinical phenotype. Example: diabetes.

3. Determining genetic impact and degree of inheritance

a. Twin studies
b. Familial clustering
c. l values for different relationships

l is the risk associated with a specific locus.
We can sum over the loci to produce the total risk lamda l

d. Segregation studies

(1) Attempts to explain mean and variance in terms of simple Mendelian models
(2) Often incorrect when the true model is complex

e. Heritability

(1) Divides the variance into environmental variance and variance due to shared genes
(2) Heritability is the proportion of total variance that is genetic:

Variance V = Vg + Ve ; // genetic and environmental contributions respectively.

h2 = Vg/(Vg + Ve + Veg)

must watch for error of ascertainment.

(3) In reality, heritability often fails to separate shared environment from shared genes

twin studies help to solve the difficult problem of separating environment from genetic predisposition.

f. Distinguishing environment from genetic factors

(1) adoption studies
(2) twins reared apart

4. Consequences of complex disease genes

a. Late age of onset
b. Diseases are often common
c. No direct genotype:phenotype correspondence

(1) Reduced penetrance

low penetrance
What fraction of individuals with a genotype show the expected phenotype

(2) Phenocopies


Notes on proper information organization.

0) Biochemistry is first of all, an information management problem.

1) it would be more appropriate to designate in order

a) the disorder, e.g. Huntington's disease
b) the genetic alteration, its location and its sequence and its category, e.g. triplet repeat
c) methods that could now be used to find it.
d) the original methods that were used to find it, i.e. the history.

Too often these concepts are interleaved in a confusing way, which creates the effect known as "drinking from a firehose" when attempting to cover large amounts of information.

2) In a presentation, open with the bottom line, open with your best slide. Then if there is interest provide the next appropriate level of detail. Too often we are dragged through the tedium of endless detail (everybody has a boring story) and the essential message and discover are lost.

 


Test Questions

1) Describe the LOD score, its formula, and meaning.
2) Describe the parametric vs. nonparametric methods.
3) Describe ways of mapping complex disease genes.
4) Give definitions of polygenic, versus oligogenic, versus monogenic
5) Describe the difference between a quantitative or continuous trait and a dichotomous or disease trait.
6) Describe the definition of lamdaSubS.
7) What is segregation analysis?

Test strategy:
In your answers remember to comment on what is available on the web and what role you think computers will have in solving these problems.

1) state your understanding of the question.
2) state your answer
3) state an example
4) state an experiment that might be used to discover the answer
5) draw a picture
6) state the role of computing machinery in answering the question.

Gene Therapy Trends Gleaned from End of Lecture

Dr. Elbein remarked that gene therapy could be delivered via transfection to lung and bone marrow via adenovirus. We discussed whether transfections would be stable or if additional transfections would be required to maintain treatment. From this discussion it appears that three kinds of gene therapy will emerge:

1) diseases for which direct gene therapy is applicable.
In this case, the gene therapy will be targeted to repair the specific error(s) via adenovirus transfection or other suitable deployment strategy.

2) indirect gene therapy.
Programming gene therapy solutions that act remotely. Example bone marrow repairs show up in the circulatory system, in hemopoisis, and in the lymphatic system. This could be used to enhance or recruit effective immunological responses.

3) diseases for which gene therapy is ineffective.
It may be that there are classes of disease that will remain problematic even though it is possible to specify the repair, and to deliver it.

 

 

Complex Disease

inherited?

consider twin studies

monozygotic twins: share all genes

dizygotic twins: share 1/2 genes, but have higher "concordance"
                        dizygotes are essentially siblings of the same age.

Case
Expressors
Monozygotic
70 - 90 %
Dizygotic
30 %

familial aggregation

Compare specific models to least restrictive, most general model.

Use maximum likelihood to compare each with restricted analysis.

General model is validated and expanded by incremental testing with restricted models.

Modeling complex disease as recessives is a common error.


Compare risk with random population.
Compare risk of sibling with random population.

Risk ratio is lamdaSubS

Total Risk = lamdaSubS = sibling risk/general population risk

Risk of Sibling might be 3-5 times that of general population

A lambaSubS of 1 means no increased risk.

We can define a locus specific lamdaSubS.

Total risk is sum of loci risk:

lamdaSubSTotal = sum(i, 1, n, lamdaSubSLocusSubi);


Consider oligogenic disease with 4-5 participating loci:

Two models for loci interaction:

1) Additive Model (Heterogeneity)

2) Multiplicative Model (Epistatic or Interactive Model)

Complex disease will combine sums and products of additive and multiplicative models.

Parallel pathways:

block both pathways and you will see disease, as in Lang colon cancer lecture (slides forthcoming)

Heritability

 

Two Point Mapping: Relates to computing lod scores, see page 320, chapter 12 of book from which lecture was taken.

 

 

 

 


Glossary: those hot monkey love, point grubbing definitions you love to know and tell your friends!

excerpted from [Klug]

anticipation: A song by Carly Simon. AND

A phenomenon first observed in myotonic dystrophy, where severity of symptoms increases from generation to generation and age of onset decreases from generation to generation. Caused by the expansion of trinucleotide repeats within or near a gene.

bivalents: Synapsed homologous chromosomes in the first prophase of meiosis.

chiasma (pl., chiasmata) The crossed strands of nonsister chromatids seen in diplotene of the first meiotic division. Regarded as the cytological evidence for exchange of chromosomal material, or crossing over.

chi-square test: Statistical test to determine if an observed set of data fits a theoretical expectation.

chromosomal polymorphism: Alternative structures or arrangements of a chromosome that are carried by members of a population.

codominance: condition in which the phenotypic effects of a gene's alleles are fully and simultaneously expressed in the heterozygote.

coefficient of selection: A maeasurement of the reproductive disadvantage of a given genotype in a population. If for a given genotype aa, only 99 of 100 individuals reproduce, then the selection coefficent s is 0.01.

complementation test: A genetic test to determine whether two mutations occur within the same gene.

complete linkage: A condition in which two genes are so close to each other that no recombination occurs between them.

complex locus: A gene within which a set of functionally related pseudoalleles can be identified by a recominatorial analysis.

concordance: pairs or groups of individuals identical in their phenotype.

de novo: Newly arising; synthesized from less complex precursors rather than having been produced by modification of an existing molecule.

discordance: nonconcordance, see above, as in twin studies.

epistasis: Nonreciprocal interacton between genes such that one gene interferes with or prevents the expression of another gene.

expressivity: The degree or range in which a phenotype for a given trait is manifested.

fragile X Syndrome: A genetic disorder caused by the expansion of a CGG trinucleotide repeat and a fragile site at Xq27.3, within the FMR1 gene.

frameshift mutation: A muational event leading to the insertion of one or more base pairs in a gene, shifting the codon reading frame in all codons following the muational site.

genetic equilibrium: Maintenance of allele frequencies at the same value in successive generations. A condition in which allele frequencies are neither increasing or decreasing.

genetic polymorphism: The stable coexistence of two or more discontinuous genotypes in a population.

genetic imprinting: A condition where the expressoin of a trait depends on whether the trait has been inherited from a male or female parent.

Hardy-Weinberg Law: The principle that both gene and genotype frequencies will remain in equilibrium in an infinitely large population in the absence of muation, migration, selection, and nonrandom mating.

heritability: A measure of the degree to which observed phenotypic differences for a trait are genetic.

 

 

 

 

 

 


 

 

References

[Klug], William S. "Concepts of Genetics", Fourth Edition 1994, ISBN 0-02-364801-5 Macmillan College Publishing Company

[Motulsky], Victor "Human Genetics Problems and Approaches", Third Edition 1997, ISBN 3-540-60290-9 Springer