Survey of Genomics Answer Key - Part 2

Survey of Genomics Answer Key - Part 2

Ayush Noori | EduSTEM Advanced Biology

Below, please find the answers for the Survey of Genomics exercises, questions 9-13. Check your work after you’ve finished!

9. Return to the NCBI homepage and then click on PubMed (under All Databases) and search for “autosomal dominant hypercholesterolemia and ALU sequences”. Under the search bar, click on Sort by: Most Recent.

Click on the “Free PMC Article” link under the 2010 study, “Genomic characterization of large rearrangements of the LDLR gene in Czech patients with familial hypercholesterolemia” (the fourth article in the list).

a. What % of familial hypercholesterolemia is due to large DNA rearrangements?

11% of LDLR allelic variants (which presumably cause FHC) are due to large DNA rearrangements greater than 100bp.

b. How many Alu elements are within the LDLR gene? Where are they?

There are 98 Alu repeats within the LDLR gene, 95 in intronic sequences and 3 in the 3’ untranslated region.

c. What percentage of the introns do Alu elements comprise?

Alu repeats comprise 65% of LDLR intronic sequences (note that this is contrary to the 85% statistic given on the preceding OMIM entry).

d. According to this more recent article how many Alu copies lie in the human genome?

According to this entry, Alu is the most abundant short interspersed nuclear element (SINE) in the human genome, with at least 1.3 million copies.

e. What % of the genome do they comprise?

Alu sequences comprise 10% of the genome content.

f. What is the connection between Alu elements and faulty LDLR’s? Explain.

Intrachromatid non-allelic homologous recombination (NAHR) between Alu elements is involved in mutation of the LDLR gene, which leads to FHC. Alu elements are recombination hotspots and increase the frequency of ectopic recombination which leads to LDLR mutations. I previously discussed this in the final paragraph of Page 10.

Navigate back to the list of articles. Read the abstract for the fifth entry in the list, “Genomic characterization of five deletions in the LDL receptor gene in Danish Familial Hypercholesterolemic subjects.”

g. What deletions caused the LDLR mutations in this study?

The five intragenic LDLR deletions were caused by Alu repeats, and Alu elements flanked the sites where deletions occurred. The deletions studied by Nissen et al. occurred in the promoter/exon 1, exon 5, exon 7-8, exon 9-14, and exon 13-15.

10. Now return to the NCBI homepage and search Nucleotide (under Popular Resources) for “Drosophila melanogaster twin of eyeless (toy)”; scroll down to Items.

What are the number of bases in the twin of eyeless mRNA Transcription factor Toy (toy) mRNA transcript variant C (first entry)?

Toy mRNA variant C is 3,870 bp in length.

a. Return to the NCBI main page and search PubMed for article # 7914031 and read the article synopsis. What surprising conclusion does the article reach?

The article concludes that eye morphogenesis (development of eye shape and structure) is under similar genetic control in vertebrates and insects, despite the millions of years of evolution separating the two. They reach this conclusion since the ey (eyeless) gene in Drosophila, the Pax-6 (small eye) gene in mice, and Aniridia in humans, which are homeotic genes which all regulate eye morphogenesis in their respective organisms, share extensive sequence homology.

b. Return to the NCBI homepage and search all databases for human aniridia by typing in NM_001604. On this new page record the number of bases in this mRNA (6,913 bp) .

c. Click on FASTA. Now look under “Analyze this sequence” (on the right) and click on Run BLAST to go to the site that performs alignment comparisons.

d. The Query Sequence is already entered for you. Under “CHOOSE SEARCH SET” click on “others (nr etc)”; make sure that “nucleotide collection” is in the Choose Search Set.

e. Under “PROGRAM SELECTION” click on "Somewhat similar sequences (blastn)."

f. Now hit BLAST (at the bottom of the page) and wait for your results.

The BLAST algorithms produce “scores” and “E values”; higher scores indicate longer sequences of identical sequence while E values close to zero indicate very low likelihood of similarities due to chance – thus, any similarities with E values close to zero must be due to homology.

g. How long is the human gene (query)?

The query length ( Homo sapiens paired box 6 (PAX6), transcript variant 2) is 6913 bp.

h. Look at the Color Key for alignment scores. How similar are these sequences?

The sequences share extensive homology since they are colored red, >=200.

i. What are the organisms with similar, nearly identical PAX6 genes?

Some of the organisms which have similar PAX6 genes (with a max score greater than 7000) to Homo sapiens are listed below.

  • Pongo abelii (Sumatran orangutan)

  • Macaca mulatta (Rhesus macaque)

  • Bos mutus (Wild yak)

  • Bos indicus (Zebu)

  • Odocoileus virginianus texanus (Texas whitetail)

  • Pantholops hodgsonii (Tibetan antelope)

  • Orcinus orca (Killer whale)

  • Ceratotherium simum simum (Southern white rhinoceros)

  • Eptesicus fuscus (Big brown bat)

  • Aotus nancymaae (Nancy Ma’s night monkey)

  • Myotis lucifugus (Little brown bat)

Clearly, the PAX6 gene is heavily conserved across a wide variety of organisms.

j. Scroll down to the first entry in “Alignments” and click on the blue Gene label beside the first entry and click again on Genomic Context on the page that appears to find where the gene is located. Scroll down and look under “Genomic Context.” Record that position here: (11p13) .

Nucleotides? (33,170 bp in length) Exons? (17 exons)

k. While on this page look in “Summary” box to see where the gene is expressed and record that information here:

This gene is expressed in neural tissues, especially in the eye.

If you click on Phenotypes you will find a list of developmental defects caused by defects in this gene.

Mutations in this gene or in the enhancer regions can cause ocular disorders such as aniridia (absence of the iris), anophthalmia-microphthalmia syndrome (underdevelopment or lack of the eyes), coloboma of the optic disc (a hole in the optic disc), congenital ocular coloboma, foveal hypoplasia and presenile cataract syndrome (underdevelopment of the macula), irido-corneo-trabecular dysgenesis (underdevelopment of anterior eye structures), hereditary keratitis (inflammation of the cornea), bilateral optic nerve hypoplasia (underdevelopment of the optic nerve), Wilms tumor (rare childhood kidney cancer), and Peter’s anomaly (underdevelopment of the anterior segment of the eye). Most of these conditions are caused by gross underdevelopment of eye structures, therefore this gene must play a critical role in the development of the eye.

l. Now go back to the FASTA sequence page from Step 10c and click on the blue Unigene label (under the Related Information column on the right). Now click on Paired Box 6. How similar is it to other PAX-6 proteins in:

Percent Similarity Aligned Amino Acid Length
Mice ( Mus musculus ) 100% 435
Chicken ( Gallus gallus ) 100% 435
Zebra fish ( Danio rerio ) 97% 431
Fly ( Drosophila melanogaster ) 69.4% 516

m. Now go back to the BLAST results page. Scroll to the first entry under Alignments. Click on Genome Data Viewer (right hand side) beside the first entry to see the genes adjacent to this gene on Chromosome 11. How many base pairs are you seeing in this first view? (Approximately 32,000 bp are initially displayed.)

How many genes?

Three genes (including the PAX6 gene) in the NCBI Annotation Release 109, and four genes (including PAX6) in Ensembl Release 94.

n. What are the adjacent genes (mouse over the gene initials to view)?

The adjacent genes include:

elongator acetyltransferase complex subunit 4 (ELP4),

AL035078.4, a novel protein coding gene

PAX6 antisense RNA 1 (PAX6-AS1), a long non-coding RNA (lncRNA)

PAX6 upstream antisense RNA (PAUPAR), a long non-coding RNA (lncRNA)

11. Now go back to the NCBI homepage and click on Structure under All Databases; then type 6Pax in the Search box and hit go.

a. Look at the result. What animal is it from? (Humans, or Homo sapiens )

b. Click on the View Structure (in the right hand box) to view the graphic. How many polypeptides are here?

There is a single Pax6 polypeptide, organized into several domains.

c. What other molecule is here?

The other molecule is 26 nucleotide DNA sequence.

12. What is the role of this protein? Read the Citation abstract to find out. Note that the full text version of the article is available free. Summarize its role:

Pax6 is a transcription factor present during embryonic development which is crucial for the development of the eye, nose, pancreas, and central nervous system. The PAX6 gene is expressed upon closure of the neural tube and induction of the neural ectoderm by weak Sonic hedgehog and strong TGF-β signaling gradients. Pax6 is involved in the development of the olfactory bulb (processes smell), as well as the activation of other morphogenic genes in prenatal development. Postnatal expression in the eye controls the activity of various genes across diverse ocular structures.

Source: (among others)

13. Look at the molecular model and click on the “Full Featured 3D viewer”. Try selecting Style > Rendering Shortcuts > Toggle Side Chains. Now rotate this molecule with the mouse and zoom in and out (using controls at the top of the page under View).

a. How would you describe the degree of fit between the protein and the other molecule?

The Pax6 protein fits snugly into the grooves of the DNA.

b. What is the difference between the major and minor DNA grooves?

DNA contains two grooves, called the major groove and the minor groove, because the glycosidic bonds (which bind the nucleic base to the deoxyribose) are not at 90° angles, rather, the major groove is “wider” (12 Å versus 6 Å) and “deeper” (8.5 Å versus 7.5 Å) than the minor groove. Furthermore, the minor groove contains the base pair O2 for pyrimidines or N3 for purines, while the major groove contains the methyl group of thymine. See Figures 27.7 (below) and 27.8 (left) from Berg et al. which elaborate on this (images not included).

Source: Berg JM, Tymoczko JL, Stryer L. Biochemistry. 5th edition. New York: W H Freeman; 2002. Section 27.1, DNA Can Assume a Variety of Structural Forms. Available from:

c. How do different parts of Pax-6 bind to the bases in these grooves?

Three subdomains of the human Pax6 gene binds optimally bind DNA over a 26 bp region – the amino-terminal helical domain, the amino-terminal β-unit, the linker region, and the carboxy-terminal subdomain. The linker domain (residues 61-76) binds extensively to the minor groove over an 8 bp region. Residues 65-67 bind to the phosphodiester backbone and Ile-68 is a conserved residue which binds to thymines 11 and 12 and guanine 10. Gly-69 binds to thymine 11, Gly-70 binds to guanine 13, Ser-71 binds to adenine 14, and both Pro-73 and Arg-74 binds to guanine 15 (this proline also alters the conformation of the polypeptide, as we discussed last term!). The helix-turn-helix unit of the carboxy-terminal subdomain, formed by helices 5 (residues 95-106) and 6 (residues 116-133), binds to the major groove. Two of these interactions are van der Waals bonds between arginines and the methyl groups of thymines, and two are water-mediated contacts of guanines. Finally, in the amino-terminal subdomain, Asn-47 binds to the methyl and phosphate groups of thymine 4, while Gly-48 and Lys-52 make water-mediated contacts.

This is demonstrated in the diagram Figure 4 from Xu et al. , “Diagram of DNA contacts in the Pax6 paired domain–DNA complex.” Shaded circles mark sites where Pax6 contacts the DNA backbone.

Source: Xu HE, Rould MA, Xu W, Epstein JA, Maas RL, Pabo CO. Crystal structure of the human Pax6 paired domain–DNA complex reveals specific roles for the linker region and carboxy-terminal subdomain in DNA binding. Genes Dev. 1999 May 15; 13(10): 1263–1275.

With these courses, we hope to further our mission to make high-quality STEMX education accessible for all. For questions or support, please feel free to reach out to me at

Best Regards,

Ayush Noori

EduSTEM Boston Chapter Founder


  1. NCBI PubMed

The premier source of past and present medical literature. Most supplemental information in Extensions is available via PubMed. When searching PubMed, be sure to use the “Free full text,” and “Sort by: Best Match” filters to find relevant and accessible results.

  1. RCSB Protein Data Bank (PDB)

A large database of useful 3D structures of large biological molecules, including proteins and nucleic acids. Use the search bar to find a molecule of interest, which can then be examined using the Web-based 3D viewer.