Analysis of the set of GABA(A) receptor genes in the human genome.

The genes of the ionotropic gamma-aminobutyric acid receptor (GABR) subunits have shown an unusual chromosomal clustering, but only now can this be fully specified by analyses of the human genome. We have characterized the genes encoding the 18 known human GABR subunits, plus one now located here, for their precise locations, sizes, and exon/intron structures. Clusters of 17 of the 19, distributed between five chromosomes, are specified in detail, and their possible significance is considered. By applying search algorithms designed to recognize sequences of all known GABR-type subunits in species from man down to nematodes, we found no new GABR subunit is detectable in the human genome. However, the sequence of the human orthologue of the rat GABR rho3 receptor subunit was uncovered by these algorithms, and its gene could be analyzed. Consistent with those search results, orthologues of the beta4 and gamma4 subunits from the chicken, not cloned from mammals, were not detectable in the human genome by specific searches for them. The relationships are consistent with the mammalian subunit being derived from the beta line and epsilon from the gamma line, with mammalian loss of beta4 and gamma4. In their structures the human GABR genes show a basic pattern of nine coding exons, with six different genomic mechanisms for the alternative splicing found in various subunits. Additional noncoding exons occur for certain subunits, which can be regulatory. A dicysteine loop and its exon show remarkable constancy between all GABR subunits and species, of deduced functional significance.

The GABA A 1 receptors are GABA-gated ion channels that are well known to mediate most of the inhibitory signaling in the vertebrate central nervous system. They also operate at a number of peripheral sites. These receptors have been linked to a variety of human pathologies and developmental abnormalities (1,2) and to human cognitive pathways (3), and they are the targets of a significant overall percentage of current phar-maceutical usage and design (4,5). For the pursuit of all of these aspects of this receptor activity, a full knowledge of the GABA A receptor genes in the human genome, knowledge which hitherto has been only partial, will clearly be an important aid.
The GABA A receptors are formed as combinations of homologous subunits (6); these are from 420 to 632 amino acids in length, of which 19 ( Fig. 1) have been cloned from mammals but not all from man (reviewed in Refs. 7 and 8), each clearly being attributed to a different gene. This repertoire of subunits is drawn upon in a restricted combinatorial system (7,8) to assemble as pentamers (9), which are the active receptors. The set of 19 GABA A receptor (GABR) subunits is the largest of any among the mammalian ion channel receptors, and the arrangement in the genome of their genes, as will be discussed, is unusual and can itself be of intrinsic significance. The mapping of these genes began in 1989 with the assignment of ␣1 to chromosome 5q34 and ␣3 to X q28, and most significantly, ␣2 and ␤1 were found together on 4p12 (10,11). The distribution of 14 of these genes in clusters on human chromosomes, as known from further cytogenetic, linkage, and genomic cloning studies, was reviewed in 1999 (12,13). Four GABR genes were known to be close together on chromosome 4, four on chromosome 5, three on chromosome 15, and three on X. The subunit gene was reported to occur separately on chromosome 5q. The order of the genes within the clusters was not always unequivocally known by the methods previously available.
From analysis of the human genome data base and with some experimental studies based on it, we can now address a number of issues that have been outstanding, as follows. 1) Are there further subunits of human GABA A receptors that, due to a greater divergence, have gone unrecognized? 2) Not every one of the genes for the 19 subunits as known from rodents and man combined had been previously found and located in the human genome. Furthermore, it is only now that the detailed locations of all of the human GABA A receptor (GABR) genes can be fully specified, together with the precise arrangement within each of five gene clusters that are present.
3) The cluster of three GABR genes on chromosome 15q, in particular, is incompletely known, due to residual refractory gaps there in the genome mapping and other ambiguities, so further investigation of this cluster was needed. 4) Certain functional GABA A receptor subunits described from nonmammalian vertebrates do not appear to correspond to any known mammalian, notably chicken ␤4 and ␥4. Similarly, from invertebrates a homologous functional GABA-gated cation channel is known. Can human orthologues of those nonmammalian types now be found? 5) Are there common features of the exon/intron organization across the entire family of GABR genes? 6) The genomic origin of alternatively spliced GABR mRNAs was not often defined previously. A surprising variety of genomic mechanisms for creating multiple forms of the GABR subunits can be specified.

EXPERIMENTAL PROCEDURES
Consensus Sequence Searches-The amino acid sequences in and immediately neighboring the Cys loop (as defined under "Results") were aligned for each receptor family as specified. For the vertebrate GABR family, 87 GABR subunit full-length independent sequences from vertebrate species were used in determining a consensus Cys loop motif for that set (form A, as defined under "Results"). Later the total of those sequences, including others from the nonmammalian vertebrates that could be retrieved and verified from the latest EMBL and trEMBL data bases, rose to 100, and it was found that all 100 matched the form A consensus. Taking positions conserved in all of the GABR sequences, a GABR Cys loop consensus was written in nucleotide form, including all codon redundancies and with bidirectionality. This consensus was used as a position-specific profile to write an algorithm for a search program (available on request), as described under "Results," designed to be run against the human or any other genome data base. For each of the other three Cys loop receptor families considered, all the human sequences were aligned to obtain three family-specific forms of the consensus. In another case where invertebrate GABR-like sequences were also considered, those were aligned in protein form to identify the consensus around the Cys loop region. In all of the GABR sequences, vertebrate and invertebrate, this agreed completely with a second minimal form of the consensus (form E, as defined under "Results"); form E was also applied to the human genome search. This type of search program registers as hits only cases where all of the specified positions match and does not require probability judgements. The hits in the GABR searches were identified in their contig locations if on known subunits and where on an unknown sequence this was identified or characterized by a BLASTN search of its contig.
Sequence Alignments and Other Search Methods-Full DNA and protein sequences were aligned and analyzed by using the Vector NTI Suite 8.0 sequence analysis package (Invitrogen). For the dendrogram construction, an algorithm based on a Neighbor-joining method was used, with evaluation by bootstrapping to 100,000 replications, plotted (unrooted) by the NJ-Plot program. A parallel tree, constructed instead by a Maximum Parsimony method with 10,000 bootstrap replications, gave a topology that overall was quite similar, but this showed low bootstrap scores (53 and 60%) at two of the final branch points and was not used. For the search applications, where not specified, on-line software packages, FASTA, WU-BLAST 2.0, NCBI-BLAST 2.0 or Clust-alW, from EBI, NCBI, or the Wellcome Trust Sanger Institute (Hinxton, UK), were used. The significance of hits in BLAST or FASTA searches was assessed by their p values (as defined in the BLAST program), with a cut-off at e Ϫ3 . Signal peptide cleavage sites were taken from the original publications where predicted or, if not, from predictions in data bases where given; otherwise they were assigned by standard methods so that the mature protein sequences were always used. For prediction of exon/intron boundaries the following splice site consensus sequences were used: for the most commonly occurring GT-AG type introns, GT(A/ G)AGT at the 5Ј-splice site and (C/T)AG at the 3Ј-splice site; for the minority AT-AC type introns, ATATCCT(C/T) at the 5Ј-splice site and (C/T)AC at the 3Ј-splice site (14). For the sequences used from invertebrates, for Drosophila sp. or Caenorhabditis elegans, searches were made in the Flybase or Wormbase data bases, respectively, for reported anion-channel receptor-like sequences, and those showing overall clear homology to vertebrate GABR subunits were selected. For Anopheles gambiae the EBI completed genome in Ensembl was used similarly. For other invertebrates the Pasteur Institute LGIC classification (15) was used, and this was much extended by searches made with the same criteria in the Uniprot/TrEMBL data base.
Exon Analyses-The subunit cDNA sequences retrieved from EMBL data bases were run against the human genome or against relevant contigs to identify exons, except in cases where there was sufficient independent published evidence, as cited here, to confirm a genome browser automated analysis of exons. Where cDNA information was incomplete, the Grail Express exon prediction program (originally from Oak Ridge National Laboratories) within the Vector NTI package was applied for exon scanning, combined with splice site analysis.
Clones from Chromosome 15 DNA-For probe sequences b-f, these were sequences of clones derived from flow-sorted chromosome 15 DNA and then selected by their relationship to reference markers in the GABR cluster region as described previously (16,17). Three of the clone sequences were used intact as follows (with their insert size): b, 149 (22 kbp); c, 329 (9.1 kbp); and f, 84 (18 kbp). Probes d (2 kbp) and e (0.8 kbp) were both derived from clone 327, by subcloning fragments of its restriction digests (into pBS/SK ϩ ), as described by Glatt et al. (17). Some shorter sequences for probe d showed overlaps, and these were first aligned to form contigs. Probe a (2.8 kbp) was the sequence of an Image EST clone (H86104, purchased from Genome Systems Inc); this was identified from the overlap of its 5Ј-end with the 3Ј-end of clone 404, which Glatt et al. (17) had found maps in this region.

RESULTS AND DISCUSSION
The genes encoding 19 subunits already known from their cloned and expressed cDNAs, as reported either from man, or from rat or mouse (where previously not identified in man), were located in the human genome, as described below. The full-length human protein sequences verified there were used to construct a dendrogram of them all (Fig. 1). Only the mature polypeptide sequences (see "Experimental Procedures") were used in the multiple alignment. In this plot the ⑀, , , and ␦ subunits are relatively isolated, but ⑀ appears as related more to ␥ subunits than others, as more related to ␤ subunits, whereas ␦ and each form a separate single subunit subgroup (Fig. 1). The six questions raised in the Introduction were considered in relation to this family of genes.

Are There Previously Undetected Subunits of GABA A Receptors Encoded in the Human Genome?
Conventional nucleotide/protein BLAST searches of the genome with known GABR cDNA sequences gave no significant hit other than on known GABR subunit or glycine receptor ␣ subunit (GLRA) sequences (data not shown). As the query here the human ␣1, ␣4, , and and the chicken ␤4 sequences were human GABA A receptor subunits. Sequences used are those of the mature protein after signal peptide removal. A distantly related outgroup marker, the nAChR ␦ subunit (as shown), was used for refinement of the plot. The length of horizontal branches connecting any two GABA A receptor subunits represents the fractional divergence in their amino acid sequence. The scale bar corresponds to 20% sequence divergence. Numbers at the junctions are bootstrap values for 100,000 replications. each applied; between them these cover a great diversity of sequence within this family, as shown by Fig. 1.
The absence of alignment in the genome with known GABR subunit sequences is insufficient, however, to definitely exclude unknown GABR genes if very divergent, because if a gene were homologous enough for known GABR exons to align well in the genomic DNA, it should already have been found by a DNA cross-hybridization and screening of libraries. Therefore, we sought to define the minimum consensus sequence elements that would constitute a signature of all GABA A receptors. We collected and aligned 100 reported full-length sequences of GABA A receptor subunits of vertebrates, from man and other mammals, birds, amphibia, and fishes. We analyzed these for common elements and found, as has been concluded earlier from much smaller samples (18 -20), that the only motif universal in them is the "Cys loop." This is a 15-residue disulfidebridged segment, always found toward the middle of the large extracellular (agonist binding) N-terminal domain. A consensus GABR-specific sequence for the framework in and close to this loop structure was found ( Fig. 2A) that applies without exception across all known vertebrate GABA A receptors. Hence, the GABA A receptor Cys loop consensus was selected as the basis of a second level search of the human genome.
However, some form of the 15-residue Cys loop is known to occur similarly in the other members of the superfamily of related transmitter-gated ion channels (18,21) and indeed has given its name to that superfamily (18,19,22). When we aligned the human non-GABA A receptor members of this superfamily now collected (see LGIC data base from Ref. 15) plus the most recently cloned members (e.g. new 5-HT3R subunit and ZAC), they showed that a different consensus in the region of the Cys loop can be defined for each family, suitable for screening each selectively (Fig. 2, A-D).
A position-specific algorithm for a genome search for GABA A receptor sequences was constructed, initially using the amino acids of the consensus A of Fig. 2. These were then used in nucleotide sequence, with full inclusion of codon redundancy. Recorded hits in this system will be clear total matches, not probabilities. The present form of this probe depends upon the consensus lying within one exon. That situation was confirmed to hold in analyses (to be described below) of the genomic structures of all human GABR subunit genes and in similar examinations of the GABR genes from analyzed nonmammalian vertebrates.
A genome search made with consensus A gave no match at any unknown sequence, although it retrieved all of the predicted human GABR genes. Therefore, we addressed the question whether some unidentified human GABR subunit sequence might exhibit a change in the Cys loop region that is so aberrant that it has not occurred in any of the 100 known vertebrate cases. In the general case, six amino acid positions is the minimum number that we found could be specified in such a position-specific algorithm without producing a prohibitive level of random hits in a genome search. We sought a 6-point consensus that would cover, in addition, known invertebrate GABR sequences, where greater variation occurs and where any other ancestral GABR line might be revealed. In the invertebrates the pharmacological distinctions seen between the GABR subunits in vertebrates have been found not to be generally applicable, and the distinction between GABA and glycine receptors can become blurred in some cases (23). An alignment was then made of the known invertebrate GABR-like proteins using the available full-length subunit sequences of the anion channel type that have a discernible overall homology to human GABR subunits. For most of those, expression and GABA responses have been reported, but some have not been tested, e.g. those in the recently released full genome (EBI data base) of the A. gambiae mosquito. For the many putative GABR-related subunit sequences (ϳ40) encoded in the completed C. elegans genome caution is needed, because evidence on their activity in assembled GABA A receptors is as yet mostly lacking and the exon organization, where deduced, can differ considerably from that of known GABR genes in other phyla; some there could not yet be classified. For all C. elegans sequences that could be assigned as possible GABR-related subunits, as well as for 29 other invertebrate cases (from molluscs, crustacea, insects, arachnids, and parasitic worms), a GABR Cys loop region 6-position consensus E (Fig. 2) was found to be completely conserved.
Consensus E thus makes the smallest number of assumptions in this series and was used in the final screen. It is comprehensive, so it will also see some other members of the superfamily (Fig. 2), but their features elsewhere in the sequence will be readily recognized. This search yielded 118 hits on sequences present in the human genome. 41 were on known Cys loop subunits as follows: 17 on GABR, 14 on nAChR (discussed below), 5 on GlyR, and 4 on 5-HT3R subtypes. The 41st match led to the human GABR 3 subunit gene, previously not characterized and now detailed below. A 19th known GABR subunit, ␥3, gave no hit here because the region containing the ␥3 Cys loop is still incomplete in the current human genome data base, as explained below. 49 of the other hits could be discarded, due to multiple stop codons in the apparent Cys loop sequence or to a clear location within an intron of a known gene or to the consensus nucleotides occurring in simple tandem repeats.
This left 28 possible candidates. All of these were located very far from any known human GABR gene. Each was analyzed. By using the most general consensus sequences for splicing sites (see "Experimental Procedures"), where exons were FIG. 2. The Cys loop consensus sequences for the four families of transmitter-gated ion channels are shown. Asterisks represent any amino acid. 100 known vertebrate GABA A receptor subunit fulllength sequences were aligned, to give consensus A. To cover both vertebrate and invertebrate GABA A -like sequences, the 6-point consensus E was required. Not included here is a recently discovered (75) branch of the Cys loop superfamily, with one known member, ZAC. That will not interfere in searches here, because its Cys loop region has two or more changes from any consensus shown here.
predicted none was at (or even close to) the 83 bp in length which was always found in known GABR Cys loop exons, as analyzed below. In the hit regions, the predicted exon was either much shorter or longer, ranging up to 109 bp. Five of the candidates could be discarded because they again contained a stop codon within the consensus Cys loop sequence or had a repeat of the consensus in tandem, with no separating exon boundary. In a sixth case, the orientation of the hit sequence was opposite that of the coding DNA at that point. For the rest, despite their varying exon length, a detailed analysis was nevertheless made by the Grail Express genomic structure prediction program. This showed that the candidate again either lies within an annotated known gene, always from a very different family and always in a known intron therein or spanning an intron boundary, or it lies in a predicted intron of an unidentified gene, which in any case has a total coding length much too small for a GABR subunit. All of those 28 hits were chance nonspecific alignments. Where the hit was on an intron large enough to contain an entire GABR gene, the Grail Express analysis was further applied separately on that intron to show that no smaller genes were enclosed and to confirm that no exon with the consensus Cys loop sequence was present. In sum, every one of the 118 hits from the minimal consensus sequence E could be accounted for, and no previously unidentified human GABR subunit gene could be detected in the genome.
As noted above, the general GABR consensus sequence E also recognized 14 human nAChR subunits. 16 nAChR subunits are currently recognized in mammals: ␣1-7, ␣9 -10, ␤1-4, ␥, ␦, and ⑀, with an ␣8 sequence known so far from chicken but not from man (for sequences see LGIC data base (15)). In all vertebrate ␤1 nAChR genes known, exceptionally, the Cys loop lies in 2 exons. The consensus B of Fig. 2 matches the protein sequence of all 16 known human nAChRs (plus the avian ␣8); when that was used in the genome search it retrieved all known human nAChRs, except ␤1, but (apart from two pseudogenes) not an ␣8 and no other gene. We confirmed the absence in the human genome of a sequence like the chicken nAChR ␣8 by a BLAST search with that ␣8 sequence, with negative results. For comparison, in the pufferfish (Fugu rubripes), 28 expressed nAChR subunit sequences (24) were recently found (all of which would be found by consensus b of Fig. 2). Like the high multiplicity of nAChR-like sequences and GABR-like sequences seen in the C. elegans genome, this suggests that diversification in the nAChR and GABR series is notably less in mammals than can occur in some lower species. Whether the known mammal-specific interspersed elements (e.g. LINE-1) could operate to restrict such diversity merits investigation.

The GABR Form of the Cys Loop Is Present in Invertebrate GABA-gated Cation and Glutamate-gated Anion Channels
A GABA-gated cation channel, EXP-1, has been identified recently from C. elegans, which is homologous to the vertebrate GABR subunits and not to vertebrate AChR or glutamate receptors (25). EXP-1 has, however, a cation-selective pore segment. Evidence that similar responses occur in certain rat brain neurons was cited (25). The Cys loop in EXP-1 still matches our GABR consensus E. It even matches the vertebrate GABR Cys loop fuller consensus A, apart from the initial Thr being Ser, a common minor variation in C. elegans, but not found in Cys loop consensus forms for the other cation channel receptor families, nAChR, 5-HT3R, and ZAC. Hence, we searched by BLAST with the EXP-1 coding sequence the full human genome and also the newly completed mouse genome.
All significant hits not found on known Cys loop genes were analyzed by the Grail Express program and shown to be from accidental similarities in noncoding sequences. Hence, GABAgated cation channels cannot be found in these mammals.
Conversely, glutamate-gated anion channels are known from C. elegans, from parasitic nematodes, and from Drosophila (26 -28). That subfamily also maintains overall homology to vertebrate GABA A receptors and is structurally totally different from the family of glutamate-gated cation channels (26). Those invertebrate glutamate-gated anion channel sequences also possess the GABR-type Cys loop consensus E. From these comparisons and those noted above for invertebrate GABR-like receptors of varied agonist specificities, we concluded that this specific form of a Cys loop framework is diagnostic for the basic GABA A receptor structure and independent of the variations in the binding site or in the pore structure.

Locations and Clusters of All GABA A Receptor Genes
Chromosome 4 Cluster-This contains two ␣, one ␤, and one ␥ genes, confirming results from earlier methods (29). We can now specify their relationships (Fig. 3A). The order of these along the chromosome is the opposite of that deduced by Russek (13) but is as mapped by Bailey et al. (12). The differing orientations of transcription of the four genes, however, confirm those deduced by Russek (13) on genomic DNA clones. No genes of other types lying between the ␥1 and ␣2 genes can be detected, but the wide region that separates the ␣2 and ␣4 genes contains the cytochrome oxidase 7B2 gene and at least one other unidentified (but non-GABR) gene. The gap between the ␣4 and ␤1 genes is the smallest here and contains no other gene.
Chromosome 5 Cluster-The two ␣, one ␤, and one ␥ genes here (13,30) are organized as shown in Fig. 3B. Their locations and separations are now corrected. Only one non-GABR gene (a member of the glutaredoxin family) has been detected so far in this cluster, between ␣6 and ␣1.
Revised Specification of the Chromosome 15 Cluster-The start of this cluster lies in band q13.2; it known to contain (16) an ␣, a ␤, and a ␥ gene (Fig. 4A). The first is the GABRB3 gene. This had been found previously to contain 10 exons (17), of which the first two are alternatively expressed (31). These 10 exons were confirmed across three adjacent contigs in the genome path. The nine introns range in size from 100 to 175,000 bp.
The region of this cluster unfortunately contains two of the rare persistent gaps in the genome tile path, presently described as "regions whose sequence cannot be reliably resolved with current technology" (e.g. UCSC Genome Gateway, latest release, genome.ucsc.edu). Due to gaps, the exon-intron structure is not yet fully covered in the genome data base for the ␣5 and ␥3 GABR genes. This causes important omissions or errors, even well beyond the gaps themselves, within the annotations of those two genes in the genome data bases. Therefore, we have used independent clones of genomic DNA segments obtained from this region, together with the known ␣5 and ␥3 cDNA sequences, to map in more detail than before the exonintron structures of GABRA5 and GABRG3 and the gene separations in this cluster (Fig. 4). The GABRA5 gene was found previously to have four exons in its 5Ј-utr region (32). To determine the exact location of these genes along chromosome 15 and to check for any others, we used selected phage clones of human chromosome 15 DNA (see "Experimental Procedures"). We found that some sequences from these phage clones aligned, with 95-100% matches, within two new larger contigs recently deposited by the Whitehead-MIT Genome Sequencing Group into the GenBank TM and Ensembl genome data bases (AC135999 cosmid and AC145196 fosmid; see Fig. 4B). To place the fosmid AC145196 (NT_079554) on the map, we aligned with the AC135999 cosmid (already mapped) and found an overlap (24.1 kbp) as represented in Fig. 4B. Alignment of the ␣5 cDNA sequence with our probe sequence b showed that b contains the first five exons of the GABRA5 gene. Probe b lies within the AC145196 fosmid sequence, and exons 6 -8 of ␣5 were also found here (Fig. 4B). We located a remaining gap (27.7 kbp) that interrupts the present genome tile path, but we found that it lies wholly within intron 8. Intronic probe a, derived like the other probes from GABRA5 genomic DNA, was found to be within intron 8; probe a overhangs at its 3Ј-end by 892 bp the 5Ј-end of AC145196 and extends into the gap for 1934 bp, reducing that gap to 25.8 kbp. Sequences of two other clones (not marked in Fig. 4) which should from their origin be close to probe a gave no match at all in the genome, so they may be from intron 8 but within the gap. Probes c-f were shown to align in cosmid AC135999 (Fig. 4B). Probe c (our 329 sequence) was shown also to overlap by 2.4 kbp with the next contig, AC145196. Probe c was used to locate the 5Ј-end of the GABRA5 gene and to exclude any further 5Ј-utr noncoding exons there. Probes d-f were located outside the GABRA5 gene. By combined use of those clones and alignments of the exon sequences derived earlier from genomic DNA clones (17,32), we could locate precisely all of the exons of GABRA5 in the genome (Fig. 4C). This confirmed the number of the exons as 13. The sizes of the exons and introns are exact or close matches to the estimates in the previous results (17,32), except that we can now specify the lengths of the noncoding exons 1-3 (each previously given as a minimum) and of intron 8. The translation starts for the GABRA5 protein at nucleotide 74 of exon 5 (24,661,758 bp on the chromosome). The transcriptional start of the GABRA5 gene (24,659,419 bp on the chromosome) has now been located at 93,260 bp downstream from the GABRB3 gene; the size found for the GABRA5 transcription unit was much increased over earlier estimates to 97,355 bp.
The ␥3 gene was previously shown, using standard cloning methods for exonic sequences, to contain 10 exons (GenBank TM accession number AF228453). However, their arrangement in the genome within 15q13 was not determined until now, due to the absence (as noted above) of exon 5 and most of introns 4 and 5 in the present tile path of the genome and the consequent perturbation of the annotations for the rest of the gene. We found three contigs (Fig. 4A) covering this region (apart from the gap). By using BLASTN to interrogate them with the ␥3 cDNA sequence, we could identify nine exons of that gene, all within those contigs. Exon 5, which includes the Cys loop, was not retrieved (because it lies in the present gap) but only inferred from the cDNA sequence. The translated sequence resumes from exon 6 to its end, and the gene is deduced to have 10 coding exons (Fig. 4D). Exon 1 encodes only the signal peptide plus 16 bp of 5Ј-utr; a preceding noncoding 11th exon has not been excluded.
The X Chromosome Cluster-This cluster is located in band Xq28 and comprises the genes of ⑀ (GABRE), ␣3, and (GABRQ) subunits. That band location is as found for GABRE (33), for ␣3 (10,11), and for (34,35). We have derived the precise locations, separations, and sizes of the genes there (Fig.  3E). There was disagreement previously on the order of these three genes relative to the centromere (cf.  (Fig. 3E) is that predicted (33) from PCR mapping on YACs and cosmids containing X chromosome markers. The two intergenic dis-tances are 2-and 2.8-fold lower than those estimated by the earlier methods. Nevertheless, these three genes are separated by four or more unrelated genes. The and ⑀ Genes and Their Relationship to the ␤4 and ␥4 Subunits-Much change in the evolution of , from the variation in amino acid sequences seen between the mouse and human subunits (exceptional for the highly conserved mammalian GABR family), has been proposed by Sinkkonen et al. (36). A unique feature of the mammalian subunit sequences is a long insertion in the second intracellular loop before the last transmembrane domain (one addition of 86 amino acids, plus two smaller insertions nearby, in human ). We noted that when this specialization is subtracted, the mature subunit shares 50% amino acid identity with the chicken ␤4 sequence. Most surprisingly, the latter is its closest relative in the entire GABR family known from vertebrates. Indeed, the gene was originally termed human ␤4 (13, 33, 35) because its structural similarity to chicken ␤4 was recognized. in heterologous expression resembles known ␤ subunits in forming an ␣ combination at the cell surface (and not ␤), but differs in that this combination does not produce active receptors until a ␥ subunit is added (34).
Only three human ␤ subunits have been found by cDNA cloning, having mostly similar properties when in receptor combinations (5, 7); furthermore, no subunit has been found in any submammalian species. Therefore, we asked whether a human ␤4 exists undetected in the genome or whether the mammalian subunit could have evolved as a modified subtype from the ␤4 subunit line. The chicken ␤4 subunit DNA sequence was used to search the full human genome for homologous sequences. The ␤2, ␤3, ␤1, and genes were thus identified as the closest relatives (in that order), followed less closely by other known GABR genes, but down to nonsignificant alignments, no unknown gene that could be a GABR gene was thus found, i.e. a human ␤4 cannot be found.
It is therefore possible that, as was suggested previously (36), the ␤4 subunit has evolved into , although by its nature this hypothesis cannot be directly established. When the chicken genome is completed, it will be tested to see if a gene is found there. We have observed (data not shown), however, that the exon/intron organization of the human gene is exceptionally similar to that of the chicken ␤4 gene. The exon lengths and the positions of the boundaries are identical in and ␤4 (disregarding the first exon containing the signal peptide and the final exon where the unique 86-residue insertion in falls, as noted above). The boundary amino acids are totally conserved (other than one Ile/Leu change). The 16 exonic nucleotides around each boundary show overall only 24% change between these two subunits, compared with 36% or more from chicken ␤4 to the three human ␤ subunits and much less than that between chicken ␤ subunits and other human non-␤ subunits. Hence, behaves as being in the ␤ family in its gene organization and is closest to ␤4 there.
The human ⑀ gene has chicken ␥4 as its nearest vertebrate relative by overall sequence identity, slightly closer than human ␥3. Three ␥ subunit genes are known in man (Figs. 3 and  4) and also in the chicken, where no ␥3 subtype has so far been found but a ␥4 subunit has, which in functional properties is quite distinct from other ␥ subunits and hence has been regarded as a fourth ␥ subtype (37,38). Avian ␥4 has relatively low protein sequence identity (64 -67%) with human ␥ subunits and much less with all other human subunits except ⑀. We searched the full human genome with the chicken ␥4 sequence; the closest partial alignment was with the ␥2 gene but ⑀ was close, followed by ␥3 and ␥1, and at much lower p values, a variety of other GABR and GLR subunits, and then weak hits on noncoding sequences.
It is not known if ⑀ exists outside the mammals. Sinkkonen et al. (36) suggest that the avian ␥4 may have evolved into mammalian ⑀, with a change (both in sequence and in function) that is unusually rapid for GABR subunits. The organization of the three GABR genes (⑀-␣3-) on the human X chromosome (Fig. 3E), plus their same order found on the mouse X chromosome (35,39), when compared with the other clusters would be consistent with ⑀ being derived from the ␥ line and from the ␤ line of GABR genes of an ancestral chordate. However, species specialization within the ␥3 subgroup might have occurred independently.
In summary, we can now say that no new human GABR gene, which could be a true orthologue of ␥4 or of ␤4, can be found. The latter conclusion is reinforced by the absence of any new ␤or ␥-like sequences in the general search (above) using the Cys loop consensus. Although an absolute negative conclusion cannot be proven here, for all of these reasons we conclude that there is an extremely low probability that any previously unknown GABR gene is present in the human genome.
The Subunit Genes- Fig. 1 shows that these lie in a separate group, sharing only 28 -38% amino acid sequence identity with the other subunits. The human 1 and 2 genes (GABRR1 and GABRR2) were both earlier mapped to the chromosome region 6q13-q16 (40,41). A 3 subunit has been known only in the rat (42).
The locations of the GABRR1 and GABRR2 genes can now be seen in the genome to be at 6q15 and are in fact very close, with no recognized intervening gene (Fig. 3D). As described above, we located with the consensus A screen a human GABR -type gene, lying on chromosome 3. To obtain the full gene there, we searched the human genome with the rat GABRR3 protein sequence of Ogurusu and Shingai (42) as a query sequence (TBLASTX), and we retrieved at a high probability level nine -type exons. These were back-translated into DNA sequences, and with these new probes we searched the human genome again (BLASTN), and we located all of them as exons of one complete gene (Fig. 5B). This lies within the AC026100 contig at the point which we had located, by the consensus A screen, on chromosome 3; its position is detailed in Fig. 5C. Fig. 5A presents the 467-amino acid sequence of the human 3 subunit. It includes a stretch encoding 99 amino acids with homology to rat 3, which had been noted on human chromosome 3 as within a noncoding sequence (41). The human 3 shares 81% identity with the rat 3 subunit, with the changes being mainly in the signal peptide and in a 21-residue segment following from that, as well as in a 26-residue segment in the last intracellular loop.
Analysis of the genome now allows us to compare the genomic organization for the three human subunits. The 3 gene has nine exons, shown in Fig. 5B. The 2 gene also has nine exons; corresponding exons in 2 and 3 are of essentially identical lengths. It is also interesting to find in the genomes of the mouse and rat a 2 gene (RefSeq accession numbers NM_008076 and NM_017292, respectively), each again adjacent to a 1 gene and again with nine exons for 2. The nine 2 exons are of corresponding sizes and are virtually identical in amino acid sequence in man, mouse, and rat. Their exon boundaries also show a high degree of conservation in all three 2 genes. The human 1 gene, however, has 10 exons; the extra exon is inserted as an exon 2, with exons 3-10 of 1 being homologous and equivalent to exons 2-9, respectively, of the 2 and 3 genes. This pattern is conserved in the rat and mouse 1 genes, each having 10 exons, and their respective coding lengths are again identical in the three species. Indeed, in all three subtypes in man, all of the exons (apart from the first and last, which contain some untranslated sequence, and excluding the extra exon in 1) do not differ by more than four nucleotides in their corresponding lengths. Hence, despite the great conservation of the genomic organization throughout this set, it is the two genes that are adjacent on one chromosome (1 and 2) that differ in exon number, thus suggesting a more recent separation.
The acquisition of an extra exon in the 1 gene of this pair raised the possibility that this segment modifies the 1 receptor function and may be a site of alternative mRNA splicing, to give 9-exon and 10-exon forms. Two functional alternative forms have been reported in the subunits, and in fact these are in 1 (43). The extra exon 2 is indeed spliced out, as discussed below. We also searched the genome data bases for any further transcripts that would align with the subunit genes, but no differences in expressed exons appeared, with one exception. A transcript annotated as human 1A (Ensembl Vega transcript, accession number Q9YGQ4) was found to contain new and very different forms of exons 1 and 2. However, when these were used in a TBLASTX search of the genome, those new sequences were not found, indicating that they are artifactual.
The ␦ and Subunit Genes-The subunit is unusual in being expressed primarily in peripheral tissues, e.g. human uterus, lung, thymus, and to a greatly limited extent in the brain (44). It has been little studied as yet but was shown to have the capacity to combine with an ␣␤ or an ␣␤␥ set of subunits to confer upon them new functional properties (45). The gene, GABRP, was located by radiation hybrid mapping to be on chromosome 5q near the telomere (12), and in the genome we found it in 5q35 (Fig. 3C). This is outside the 4-member GABR cluster in 5q34 (Fig. 3B).
In the GABRP gene we found 10 exons, but exon 1 contains only the 5Ј-utr sequence, whereas exon 2 has further 5Ј-utr sequence (43 bp) and then encodes the first 18 amino acids of the signal peptide. Exon 10 encodes the final 100 amino acids followed by 1789 bp of 3Ј-utr sequence, including a polyadenylation signal and poly(A) tail sequences. Hence this gene conforms to the usual GABR pattern of nine coding exons.
The gene location is at 8.6 Mbp distal to the end of the cluster in chromosome 5q of the genes for the ␤2, ␣6, ␣1, and ␥2 subunits, which, for comparison, covers a total length of 1.1Mbp (Fig. 3B). Between that cluster and the gene other genes occur, on the order of 25 genes or more, some as yet unidentified but all clearly unrelated to GABR genes, whereas within the cluster only one non-GABR gene can be detected. This makes it uncertain whether or not the gene was directly related in evolutionary origin to the GABR genes in that cluster. We approached this question by species comparisons. A mouse cDNA sequence with high homology to the human cDNA sequence has been deposited in the RefSeq data base (Ref. Seq accession number NM_146017), and although the mouse transcript has not yet been reported as being translated and functional it has been assumed in the data base to be derived from an orthologous mouse gene, Gabrp, in view of its 93% predicted amino acid identity to the human subunit. A BLAST search with this cDNA (99% match on the exons) locates the Gabrp gene on cosmid AL669814 on mouse chromosome 11 in the A5 sub-band (at 33.91-33.94 Mbp). Indeed, this is in the region syntenic to human chromosome 5q33-35. Furthermore, four mouse GABR genes, for the same four GABR subunits that form the cluster in that region of human chromosome 5, were found in the adjacent region of mouse chromosome 11, i.e. band B1.1. Those four genes form a similar cluster and are in the same order and orientation in the two species. The mouse GABRP gene lies beyond the ␥2 end of the cluster, as on the human chromosome, and at a distance of 8.6 Mbp in both species. The complete conservation of this arrangement of five genes during evolution from a common ancestor of mouse and man supports the concept that the gene is not in this vicinity by chance but by divergence from an ancestral precursor of that cluster.
There is also genomic evidence that the subunit exists in lower vertebrates. In the chicken, a homologous EST of 660 bp has been reported (EMBL data base accession number BM440205). In the pufferfish (F. rubripes) genome (currently partially sequenced), we found by searches an incomplete reading frame (SINFRUG00000148202) comprising 5.9 kbp from a putative orthologue of the human GABRP gene. Both the predicted avian and fish protein sequences gave clearly the highest match to the subunit in reciprocal BLAST analysis against the human genome. The pufferfish gene partial sequence has eight predicted exons, which correspond to the region containing human exons 3-10 (the start of the coding region up to part of exon 3 being not yet sequenced in the Fugu genome). The introns are all small, as generally in Fugu. The predicted protein sequence of the pufferfish shares 62% identity with the human sequence over that region, with almost all of the variation being in the long second intracellular loop of each, generally variable. Exons 4 -8 have identical lengths in the fish and human sequences. The chicken nucleotide partial sequence corresponds to the final exon 10 of the human and fish genes. The coding sequence within that exon (which is followed by a lengthy 3Ј-utr in each case) shares 74% identity at amino acid level between man and bird.
The ␦ gene, GABRD, is isolated, being the sole GABR gene on chromosome 1. It was located in band p36.3 by Windpassinger et al. (46), who also obtained cDNA sequences to deduce that it has 11 exons, 10 of which are coding. Three of these were proposed to be expressed alternatively, although three corresponding subunit cDNAs were not isolated from tissues. We have confirmed in the genome that this gene is in 1p36.3, but only eight exons are currently annotated as such in data bases, with seven of these in full agreement with the last seven exons reported in Ref. 46. First, by BLASTN search of the genome with cDNA sequences covering the other four reported exons, we found in the GABRD gene that a noncoding exon is indeed present in the 5Ј-utr, being in fact the first exon of the gene; this corresponds to the reported (46) GABRD "1B" sequence (except for the first 12 nucleotides of 1B, which are not present in the GABRD gene). Second, exon 2 of GABRD starts with the sequence of the "5Ј-variant exon 1C" of Ref. 46, which contains a further 301 bp of 5Ј-utr and then encodes the initial 37 amino acids including the signal peptide. However, there is no adjacent intron following, and exon 2 in fact continues with a further 115 bp that encode the next 39 amino acids. Then the remaining seven exons previously described, together with their introns, were confirmed as completing the gene. The last exon has both the C-terminal coding sequence and a considerable length of 3Ј-utr, including a polyadenylation signal and poly(A) tail. The updated location and genomic structure, with a total of nine exons, are shown in Fig. 6.
The apparent 11th GABRD exon (46) was termed 1A, as an initial exon expressed as an alternative to 1C and thus introducing a different signal peptide. This sequence had been obtained only from ESTs. However, after various BLAST searches with that exon 1A as a query, we cannot find the 1A FIG. 6. The location, direction, length (in kbp), and exon/intron structure of the ␦ subunit gene on human chromosome 1. Two known alternatively spliced transcripts, 1a and 1b, are indicated, either with or without the wholly utr exon 1. sequence or anything approaching it in the human genome or in the contig involved. Windpassinger et al. (46) had also noted that they could not find exon 1A in the then available genome data bases, attributing this to gaps in the map. Therefore, we have omitted it from the GABRD structure. However, some mystery remains, because the second version of the signal peptide that is encoded by that proposed alternative exon 1A (46) would have 60% identity to the rat and mouse signal peptides, as known from their cDNAs. In particular, 1A would translate to give an unusual motif in the center of the signal peptide, LLXPLLLLC, which does occur identically there in those two rodent ␦ subunits. In contrast, the signal peptide encoded by exon 1C, as found also in the presently known human ␦ subunit cDNA, has no homology in the signal region to the rat or mouse ␦ subunits. Therefore, we cannot fully exclude an additional coding exon, which would be same as or similar to exon 1A of Ref. 46 and which can be spliced in as an alternative to exon 2. More cloning of human ␦ cDNAs from different tissues is needed to test for this possibility. If found, it must still be explained why it is not detected in the genome by normal procedures, because no gap in the tile path in this region is now shown. A population difference is a conceivable cause.
Another ␦ transcript with a different 5Ј-end was also proposed, based upon ESTs and 5Ј-extensions made by PCR of partial ␦ cDNAs (46). That would be without the aforementioned 1B 5Ј-utr sequence. We find that the latter segment constitutes exon 1, so alternative splicing might occur there to give the a and b transcripts indicated in Fig. 6. Evidence is still lacking for tissue expression of the entire native ␦ subunit mRNA in alternative forms, and likewise for their translation and functionality.
Location of a Gene Related to the Three Known Human Glycine Receptor ␣-Subunit Genes-Our search with the 8-point Cys loop algorithm ( Fig. 2A) gave a strong match at an unknown gene sequence on the X chromosome (q-arm). The predicted protein sequence around this Cys loop match has a Thr at 5 positions before the first Cys and a Met at 2 positions after the second Cys, a signature of the glycine receptor ␣ subunits (Fig. 2D). For the three human glycine receptor ␣ subunits, ␣1-␣3, which have previously been cloned, genes have been located on chromosomes 5, X (p-arm), and 4, respectively (47). Those three genes (GLRA1-3) also gave strong matches with the 8-point (2A) Cys loop screen. A fourth glycine receptor ␣ subunit cDNA from the mouse has been cloned and expressed (48) but has not been found in man, and the question whether a human glycine receptor ␣4 exists has been left open (49). None of the human genome data bases or browsers currently lists a GLRA4 gene. Therefore, the mouse ␣4 protein sequence of Harvey et al. (48) was used as a query sequence (TBLASTX) to search the human genome. We retrieved nine peptide sequences that show a high degree of homology to the mouse Glra4 exons. Nucleotide probes based on these peptide sequences were then used as the basis of a nucleotide search (BLASTN), and we were able to locate all nine probes as exons of one gene within two adjacent contigs in Xq22.3 (Fig. 7). These encode a sequence closely related to the mouse glycine receptor ␣4 subunit, but this has a stop codon just before the predicted 4th transmembrane domain (Fig. 7) and is thus a pseudogene. This gene lies far from the GABR cluster present on Xq (Fig. 3E), which starts at 47 Mbp downstream (Fig. 7). It is noteworthy that the mouse gene for the functional glycine ␣4 subunit is also located on the X chromosome (49) in a region that is, in fact, syntenic to that in Fig. 7.

Six Genomic Mechanisms for Creation of Multiple Forms of GABR Subunits by Alternative Splicing
With four of the GABA A receptor subunits, two alternatively spliced functional proteins have been found earlier to be expressed in tissues, as noted below for ␤2, ␥2, ␥3, and 1. However, for nine GABR subunits in all, alternative splicing of the pre-mRNA can be found, and we here relate this to their genomic structures.
An alternative, functional "long" form (␥2L) with an insertion in the second intracellular loop was reported previously in the ␥2 subunit and another in ␤2 (␤2L) ( Table I) but at different positions along that loop. The exon spliced in is now identified as exon 9 or 10, respectively, in the human genome (Table I).
For the ␤2 case, the subunit is exceptionally conserved, with all its other coding exons identical in human and rat genomes other than one codon difference (Ser/Thr-372). When compared with that conservation of the short form (seen even in the chicken (50) ␤2 protein, 98% identical to human), the reported lack of detection (50) of a rat or bovine version of the alternatively spliced long (␤2L) mRNA could mean that this additional exon present in the human gene is of low biological significance or has a special role in man. Therefore, we searched (TBLASTN) the rat and mouse genomes; a ␤2 subunit gene exists there on rat chromosome 10 and on mouse chromosome 11. Although those rodent genes are annotated as having only 9 exons, our searches showed first that each gene has (as in man, Table I) an additional initial exon that contains only 5Ј-utr sequence. Second, a genomic sequence exists within each that corresponds to the spliced human ␤2 exon 10 in sequence and position and with the same boundaries. Its start is at 27,795,098 bp along rat chromosome 10 but encoding 31 amino acids in place of the 38 in man and in mouse. The rat sequence is IFYKDIKQNRTQYQSLWDPT-RWTTYYHFSLY-, where the dashes denote a 6-amino acid internal deletion and a Cterminal single one and boldface type denotes the five differences, all relative to the human. The mouse exon 10 differs from human exon 10 only at two positions, 1(M) and 22 (D). Therefore, there are 11 exons in GABRB2 in all three species. Four of the five potential protein kinase sites in human and mouse exon 10 (Table I) are retained in the rat. All of the various features conserved support the view that the known alternative splicing of exon 10 in the human ␤2 transcript is of functional importance.
Five other types of alternative splicing found in the GABA A receptors are described in Table I. Type 2 is seen in the ␤3 gene. It is known (31) that the relative abundance of the two forms of ␤3 mRNA varies with the tissue, that each 5Ј-utr in question contains functional promoter activity, and that the first exon 5Ј-utr sequence is almost identical in human and rat DNA. Those results suggest that this alternative splicing has a role in the control of expression of the mature ␤3 subunit. Furthermore, the full exon/intron structure of GABRB3 was reported by Glatt et al. (17), revealing 10 exons, confirmed now in the genome. Nevertheless, each expressed form of this subunit again uses nine exons, as in other subunits.
For type 3, alternative splicing creates transcripts that differ only in their 5Ј-utr sequences and promoters. Six ␣2 subunit forms can thus arise (Table I), all being functional and differentially expressed in development in the rat (51), and we found equivalent exons for all such forms in the human ␣2 gene. The human ␣5 gene exhibits a similar phenomenon, not found in other GABR genes (Table I). By searching the relevant genome regions, we found 12 exons (3 noncoding) in the human ␣2 gene and confirmed 13 in ␣5 with details shown in Fig. 4C. Their 5-utr sequences are, again, unusually conserved between rat and human, consistent with a regulatory function.
In type 4, alternative forms are expressed with several coding exons (contiguous except in ␤2) spliced out together. For the 1 subunit, two variants of the original 1 were cloned (43). One is active and has a type 1 deletion (Table I), which we found to correspond to the loss of exon 2. The other is inactive, losing a 150-amino acid segment of the N-terminal domain. We found this is due to the loss of exons 2-5, so that a basis for the proposed alternative splicings exists. The active 1 form lacking only exon 2 would be equivalent in exonic structure to 2 and 3 (Fig. 5) and therefore may well have some biological significance.
For the ␣4 subunit, a very severe, inactivating truncation arises in both the human and mouse ␣4 transcripts from the deletion of 929 bp and an introduced frameshift (52). This product corresponds to the splicing out of exons 3-8 of the nine ␣4 exons. Mu et al. (52) suggest that such inactive truncated forms, where expressed as polypeptides, may have a regulatory function in the assembly or trafficking of some GABA A receptors.
Proteins that are much truncated are also predicted to arise from alternative splicing of the ⑀ gene mRNA. In the studies of X chromosome genomic DNA clones by Wilke et al. (33), as noted earlier, the human GABRE gene was deduced to contain eight exons, exceptional among GABR genes. However, Sinkkonen et al. (36) found nine exons for the ⑀ subunit by screening libraries with fragments of the ⑀ cDNA. We have confirmed a total of 9 ⑀ subunit exons in the genome, all nine being coding exons. Four expressed ⑀ variants can be found, the first being the full-length functional transcript (33,(53)(54)(55). In variant 2, exons 1-3 are spliced out, with protein translation then starting at an alternative Met. In variant 3 that same splicing occurs in combination with deletion of residues 127-158 from the center of exon 4, using an internal cryptic site there for intron-independent splicing. In variant 4 only exon 1 is deleted. The mRNAs of variants 2 and 3 have indeed been found by Wilke et al. (33) in several peripheral tissues. The full-length human ⑀ expression is high in heart and lung and is also present in brain (33,54,55), and the same is true for the rat (36); in human heart it is confined to the electrical conduction system and is located throughout it (54), suggesting a specialized non-neuronal role there. The other variants lack the signal peptide and also (for forms 2 and 3) lack a significant region of the extracellular N-terminal domain, but roles for those in combination with other (full-length) subunits or in regulation of GABA receptor trafficking remain possibilities.
Another origin of two forms of a GABR subunit can be an intronless mRNA splicing at an internal cryptic consensus splice site (type 5). This was originally found for two chicken ␤4 transcripts (56). It also occurs with the human ␥3 gene, where Poulsen et al. (57) found a second transcript having an insert of only 18 bp, which is similar in its position and in introducing a protein kinase C site to that created by conventional splicing in ␥2 mRNA. For the ␥3 case, the secondary splicing occurs at a cryptic site in the immediately adjacent intronic sequence; we confirmed that no corresponding extra coding exon exists in this gene (Fig. 4D). No transcript which has been conventionally spliced at an exon/intron boundary has been reported for ␥3, but it cannot yet be excluded that a 9-exon form is also expressed in some locations.
Alternative splicing has thus been reported for the pre-mRNAs of two of the three ␥ subunits, but no such additional full-length ␥1 transcript (nor an indicative EST) has been reported. However, by analysis of the GABRG1 gene, we found only the nine exons that correspond, in their lengths and positions, to the nine exons in GABRG2 that encode the short form of the ␥2 protein (data not shown).
Finally, a novel form of GABR alternative splicing (type 6) has been discovered recently in the human ␥2 gene product, independent of its exon-9 splicing described above. In this form, an additional 120-bp exon is created by a weak splice site inside intron 5 by an Alu repetitive element residing there (58). The protein product may affect the trafficking of active GABA A receptors (58). We can regard this phenomenon as inserting a pseudo-exon 5a. The 300-bp Alu sequence is specific to primates and is common in man, producing ϳ5% of human alternative splicing. It is also present in introns of several other GABR genes: ␣1, ␣3, ␣5, ␤1, ␥1, 1, and . If in any of those genes a favorable intronic splice site is created, this could give rise to a well expressed alternative GABR form not predictable from the exon content.

Common Features of the GABR Gene Clusters
The clusters have now been specified more fully than was previously possible. It is clear that there is a common pattern of GABR gene arrangement over chromosomes 4,5,15, and X of ␤-[␣]-␣-␥, where there are either 1 or 2 ␣ genes and where in one case (on X) a ␤ gene can be replaced by the related and the ␥ by the related ⑀ (36) (Figs. 4 and 5). The GABR set appears to exhibit the most extensive degree of clustering of the genes relative to their total number which is known (excluding the special case of the exceptionally numerous olfactory and gustatory receptors) for any one receptor family. Because the chromosome 15 cluster has the same pattern as the clusters on chromosomes 4 and 5 but lacking a second ␣ subunit gene, we searched the intervening sequences within that cluster (see above) for any unidentified GABR-like gene and found none. FIG. 7. Location on the X chromosome of the hit registered with concensus 2A at a human glycine receptor-related sequence. This was seen to be closely related to the ␣4 subunit (48) of the mouse glycine receptor. However, it is a human GLRA4 pseudogene, having a stop codon in exon 9 at amino acid position 433. This deletes the 4th transmembrane domain of the ␣4 subunit and so is unlikely to be in an active receptor, although some other role for it is not excluded. The location, length (in bp), and exon/intron organization of this human GLR ␣4 pseudogene are shown. It spreads over 2 contigs; the first 7 exons are in Z93848 and exons 8 and 9 are in AL049610, all in antisense direction.
The ␦ gene and the three genes lie outside those clusters (Figs. 3, 5, and 6). The subunits can co-assemble in a pool separate from that of the other GABR subunits and produce a very distinctive pharmacology (59). The chromosomal isolation of the genes, taken together with the latter features, suggests that an ancestral separated from the other GABR genes at a very early stage in GABR evolution. This is supported by the position of the set in dendrograms of vertebrate/invertebrate GABR subunits; there the mammalian is much closer to the RDL GABR subtype (which has some -like pharmacological features) from insects (23) and from C. elegans (60) than to other mammalian GABR subunits.
The transcriptional orientations of the GABR genes have been observed previously to differ within a cluster (13,17). We could now confirm directly that in the chromosomes those orientations are in a uniform pattern in the chromosome 4 clusters, with the ␤ gene (or , in the X-chromosome cluster) running head-to-head to its ␣ subunit neighbor. This may indicate that a common regulatory element lies between that pair of ␣ and ␤ genes. Some support for this idea comes from knockout mice in which the ␣6 gene was disrupted by neomycin gene insertion into its exon 8 (61). The mRNA and protein levels of the ␣1 and ␤2 subunits in the forebrain were selectively decreased, those being the products of the two chromosomal neighbors of GABRA6. One possible interpretation of the GABR gene clusters would be that there is an inbuilt mechanism there for some co-ordination of their expression, but if so this must be regionally or developmentally restricted, because GABR subunit co-assembly in general is not at all restricted to combine subunits from each clustered gene set. It has been noted previously, however, that some common GABR co-assemblies are indeed produced from co-clustered genes, e.g. ␣1 ␤2 ␥2 is the most abundant GABR subtype (62). Another interpretation of the clusters (13,29,30) is that they arose from an ancestral ␣␤␥ receptor set by a series of duplications, sequence divergences, and chromosomal translocations. There is evidence, for example, that human chromosomes 4 and 5 had a common single chromosome ancestor (63), which could explain the similar pattern of their GABR clusters. So far as the GABR genes have been mapped to mouse chromosomes, they form the same clusters with the same order and spacing, in diverse regions that are syntenic to the translocated regions carrying them on the human chromosomes. This is exemplified by the results we described for the mouse equivalent of the human chromosome 5 cluster. Hence these clusters remain stable in some chromosome rearrangements, giving credence to the hypothesis that they survived throughout earlier ones. The two interpretations of the clustering are not mutually exclusive.

Common Features of the GABR Genes
A common pattern persisting through at least 17 of the GABR genes was found, i.e. with nine coding exons expressed in a major transcript. Only two of the genes are apparent exceptions: GABRD has 8 exons and GABRG3 has 10 coding exons with no known alternative exon splicing. The overall sizes of the genes (or transcription units), where estimated at all in the past, were sometimes in error. They cover a surprising range for one family, from 8 (GABRD) to 601 kbp (GABRG3) (Figs. 3-6). The cluster on chromosome 15 is exceptional, with ␣, ␤, and ␥ genes all having very large introns, which may be a sign of a different evolutionary history to the other clusters.
The ubiquitous presence of the disulfide-bridged loop structure provided an essential tool for this study. Our analyses of Protein product cannot reach cell surface (58) a The total number of exons, coding and non-coding, as determined or confirmed here on the genome. b The ␤2 gene has 11 exons: exon 1 has 74 bp of 5Ј-utr (unreported), exon 2 has 152 bp of 5Ј-utr plus 77 bp coding for the signal peptide, exon 10 is expressed only in the "␤2L" form.
c This may occur also with the ␦ subunit, but expression in vivo of the second form has not yet been confirmed; see text on the ␦ subunit. d In the human genome we located, using Grail Express analysis, a similar set of three 5Ј-utr exons with very high homology to the rat set, but alternative transcripts formed with these have not been investigated so far. e Ensembl sequence ENSESTT00000025025.
all human GABR genes also found that the exon encoding the Cys loop and surrounding sequence has an invariant length, i.e. 83 bp. This holds further in all GABR subunit gene structures known in the vertebrates. This precise conservation of exon length is unique among the exons of GABR genes and again suggests a common and essential role for this structure in all vertebrate GABA A receptors. The role of the Cys loop has been studied in the nAChRs, where evidence has been found that it acts as an essential core in the tertiary folding of the major N-terminal domain of each subunit and that it guides, by a conformational change in the loop, the interactions of the subunits to form the active oligomer (64,65). In addition, two recent mutagenesis studies, on Cys loop residues in GABA A and in glycine receptors (66,67), have provided direct evidence for a role for the loop in relaying the agonist binding conformational changes to the channel domain. We deduce that these requirements must impose an invariant framework on the Cys loop structure in the entire GABR family and throughout its evolutionary history.
Furthermore, there is a striking absolute conservation, throughout the known vertebrate and invertebrate GABRs, of a single associated residue outside the loop, namely Ser at 6 places downstream (Fig. 2E). This Ser is likewise maintained in two of the other families of vertebrate Cys loop receptors (Fig. 2). This Ser lies in loop B of the current structurally based 6-loop model of the ligand binding domain, first found in the nAChRs (68) and extended also (69) to vertebrate GABRs. The essentiality of this Ser in the human GABR ␥2 subunit has been demonstrated very recently by its mutation, which prevents active receptor formation with wild-type ␣ and ␤ subunits (58). We note that all of the evidence cited here supports the concept of a constant and essential role for the framework (consensus E) that we have used in the analysis.