Respiratory chain Complex I of unparalleled divergence in diplonemids

Mitochondrial genes of Euglenozoa (Kinetoplastida, Diplonemea, and Euglenida) are notorious for being barely recognizable, raising the question of whether such divergent genes actually code for functional proteins. Here we demonstrate the translation and identify the function of five previously unassigned y genes encoded by mitochondrial DNA (mtDNA) of diplonemids. As is the rule in diplonemid mitochondria, y genes are fragmented, with gene pieces transcribed separately and then trans-spliced to form contiguous mRNAs. Further, y transcripts undergo massive RNA editing, including uridine insertions that generate up to 16-residue-long phenylalanine tracts, a feature otherwise absent from conserved mitochondrial proteins. By protein sequence analyses, MS, and enzymatic assays in Diplonema papillatum, we show that these y genes encode the subunits Nad2, -3, -4L, -6, and -9 of the respiratory chain Complex I (CI; NADH:ubiquinone oxidoreductase). The few conserved residues of these proteins are essentially those involved in proton pumping across the inner mitochondrial membrane and in coupling ubiquinone reduction to proton pumping (Nad2, -3, -4L, and -6) and in interactions with subunits containing electron-transporting Fe-S clusters (Nad9). Thus, in diplonemids, 10 CI subunits are mtDNA-encoded. Further, MS of D. papillatum CI allowed identification of 26 conventional and 15 putative diplonemid-specific nucleus-encoded components. Most conventional accessory subunits are well-conserved but unusually long, possibly compensating for the streamlined mtDNA-encoded components and for missing, otherwise widely distributed, conventional subunits. Finally, D. papillatum CI predominantly exists as a supercomplex I:III:IV that is exceptionally stable, making this protist an organism of choice for structural studies.

have been traditionally considered a species-poor and insignificant group. Recently, this view has drastically changed. Global environmental surveys have revealed that diplonemids are, in fact, one of the most abundant and diverse marine eukaryotes on Earth (31,32).
In addition to rapidly evolving mitochondrial gene sequences, the mitochondrial genome structure of diplonemids is most eccentric (33,34). mtDNA of the type species Diplonema papillatum consists of ϳ80 distinct circular chromosomes either 6 or 7 kilobase pairs long, and its mitochondrial genes are split into up to 11 pieces (modules) of 43-534 nt. Each such piece is encoded individually on one of the DNA circles, transcribed separately into RNA precursors, end-processed to contain exclusively coding sequence, and finally joined into contiguous RNAs (35,36). The molecular mechanism by which this unique trans-splicing proceeds has yet to be unraveled.
A further remarkable post-transcriptional process in diplonemid mitochondria is RNA editing, which affects more than three-fourths of mtDNA-encoded genes. One type of RNA editing observed in this group somewhat resembles U-insertion editing in kinetoplastid mitochondria, as up to 50 uridines are added at 3Ј-ends of certain gene module transcripts. This "U-appendage" RNA editing takes place before trans-splicing. The other RNA editing types are C-to-U and A-to-I substitutions, which are tightly clustered (e.g. 29 sites within a stretch of 55 nt in the nad4 pre-mRNA of D. papillatum). RNA editing in diplonemid mitochondria is function-critical, restoring reading frames and compensating for lost portions of gene sequences (37).
In this report, we present experimental and in silico evidence that the enigmatic mitochondrial y genes of diplonemids in fact code for CI subunits. We further establish the catalogue of nucleus-encoded CI proteins for D. papillatum. These y genes illustrate the extent to which fast-evolving Diplonemids (blue) belong to Euglenozoa (shaded background) together with kinetoplastids and euglenids. The phylogenetic tree was constructed using 10 mitochondrion-encoded Nad proteins from representative species of major eukaryotic groups. Note the high evolutionary rates in euglenozoans. Scale bar, number of per-site substitutions.

Respiratory chain Complex I of diplonemids
mtDNAs such as those from diplonemids are exploring the functional limits of sequence space. In addition, the example of y genes demarcates the limitations of computational function assignment based on protein sequence alone, with up to 30% of yet unassigned protein-coding genes even in genomes of model organisms such as yeast, Escherichia coli, Arabidopsis, and humans (39,40).

Gene structure and post-transcriptional maturation of y genes in D. papillatum mtDNA
We reported earlier six unidentified genes (y1-y6) in mtDNA of D. papillatum. These genes consist of two to five modules that are transcribed separately followed by transsplicing of module transcripts to contiguous polyadenylated mRNAs (Table 1). Preliminary tandem MS data supported the notion that these y genes code for proteins rather than structural RNAs (37). Compared with the assigned mitochondrion-encoded genes of D. papillatum, RNA editing of y genes is much more prevalent. Post-transcriptional substitutions are observed in more than half of the y genes, but only in 20% of the assigned genes. Further, U-appendage RNA editing affects all y genes, but only 60% of the assigned genes, with the numbers of both sites per gene and uridines added per site being considerably higher in the y genes (Table 1).
To conduct more sensitive searches of homologs beyond diplonemids, we built for each Y protein multiple alignments of sequences from the four taxa and constructed profile hidden Markov models (HMMs), with which we searched for homologs among GenBank TM RefSeq mitochondrial proteins from diverse eukaryotes. Y1 was identified as a potential divergent Nad3 (E-value ϳ10 Ϫ4 ), but no significant hits (if any) were found for other Y proteins. More convincing results were obtained with profile HMM-profile HMM comparisons, which are more sensitive than profile-sequence comparison methods at detecting remote similarity (41). For that, profile HMMs were built from a taxonomically broad sampling for each of the mitochondrion-encoded CI subunits not previously known to be encoded by diplonemid mtDNA (Nad2, -3, -4L, -6, and -9). Comparison of these Nad profile HMMs with Y-protein profile HMMs yielded top hits with E-values ranging from 8.1 ϫ 10 Ϫ4 (Nad9) to 5.4 ϫ 10 Ϫ2 (Nad6), assigning Y proteins to Nad2 (Y3), Nad3 (Y1), Nad4L (Y6), Nad6 (Y5), and Nad9 (Y2), all subunits whose genes previously seemed missing from D. papillatum mtDNA (Table 1, Fig. 2 (A-E), and Figs. S1 and S2).
The assignment of Y5 to Nad6 was the most challenging, because the sequence similarity is only marginal. Still, two protein features support this assignment. First, polar and apolar residues in Y5 homologs are distributed across the protein in a similar fashion as in Nad6 (although suggesting four instead of five transmembrane helices (TMHs) in Y5; Fig. S3). Second, all diplonemids share a conserved hydrophobic patch that in other Nad6 homologs contains the tentatively functionally important and conserved Tyr residue of the third TMH (Tyr/Phe in diplonemids; Fig. 2D) (42). More details on the evolutionary remodeling of diplonemid Nad6(Y5) are presented in the supporting Results and Figs. S1-S3.

Y proteins are components of the respiratory chain Complex I
To verify whether Y proteins are indeed part of CI, we separated mitochondrial multiprotein complexes of D. papillatum by blue native PAGE (BN-PAGE). Complex I was identified by in-gel staining of NADH-dehydrogenase activity, revealing ATP synthase subunit 6 (CV) Protein of unknown function 2 0 2 (28) y5 (nad6) b NADH dehydrogenase subunit 6 (CI) a Homologs are present in euglenids (E. gracilis) and kinetoplastids (T. brucei). b Homologs have not been detected in euglenids but are present in kinetoplastids (T. brucei): murf1 (nad2), cr3 (nad4L), cr4 (nad6), and cr5 (nad3).

Respiratory chain Complex I of diplonemids
three distinct bands migrating at 2.1-2.5 MDa, 1.3-1.5 MDa, and 1.1-1.3 MDa (Fig. 3 and Fig. S4). In the following, we will refer to these bands as large, medium, and small, respectively.
The ratio of NADH-dehydrogenase activity (relative to protein quantity) across these bands was ϳ2:3:3 according to a time series of activity staining.

Respiratory chain Complex I of diplonemids
All three BN-PAGE bands were analyzed by MS (Table S1). For that, the bands were digested with trypsin or with trypsin and chymotrypsin combined, the latter to increase the number of peptides in the observable size range for mtDNA-encoded proteins. Trypsin digestion produced peptides of a subset of assigned mtDNA-encoded proteins and four of the six Y proteins (Y2, Y3, Y4, and Y5), whereas combined digestion yielded peptides of all assigned mtDNA-encoded Nad proteins, as well as five Y proteins (Y1, Y2, Y3, Y4, and Y6; Table 2 and Fig. S5).
To determine which components belong to the same complex, we performed protein quantification based on peptide intensities (intensity-based absolute quantification (iBAQ) (45)), as well as spectral counts (protein abundance index (PAI) (46)) versus protein enrichment in a given CI band relative to total mitochondrial lysate. The results of this analysis ( Fig. 5 and Table S2) strongly indicated that Y1, Y2, Y3, and Y6 are indeed part of the same complex as the previously assigned Nad proteins (and nucleus-encoded CI subunits; see below). Due to technical challenges, Y5 could not be reliably quantified (see also supporting Results).
The only remaining unassigned protein is Y4. It is predicted to have two TMHs and was detected in all three CI bands (with the highest abundance in the large band; see Table S2). Because no homologs were found in the inferred proteomes of other diplonemids (38), we speculate that Y4 could be a novel, fastevolving accessory protein of the diplonemid respiratory chain.

Composition of Complex I from D. papillatum
We investigated which of the proteins detected in MS experiments have significant sequence similarity with CI subunits from model organisms. Specifically, we performed sequence similarity searches of all experimentally confirmed NADH: ubiquinone oxidoreductase subunits from model organisms against the D. papillatum proteome. Inversely, we searched for homologs of all proteins, for which peptides were detected by MS analysis of the three complexes, against GenBank TM . When an otherwise conserved CI component was not found by BLAST (e.g. NDUFA2, NDUFA8, and NDUFS6), the corresponding Pfam or in-house-generated profile HMM was used to search against the entire genome-inferred Diplonema proteome.
By employing this strategy, we identified among the experimentally determined Diplonema proteins the expected four nucleus-encoded core subunits and 16 of the ϳ25 accessory subunits that are broadly distributed across eukaryotes (Table 2 and Fig. 4). In addition, all three examined CI-containing complexes from Diplonema possess six of ϳ35 accessory proteins that were reported to be associated specifically with CI from Trypanosoma brucei ( Table 2 and Fig. 4) (20,47). Thus, the count of nucleus-encoded CI subunits having homologs in other lineages (hereafter termed "conventional") is 26. It should be noted that D. papillatum appears to lack certain otherwise widely distributed subunits, such as NDUFS4, NDUFA1, NDUFA11, and NDUFB1-NDUFB5 ( Fig. 4 and Table S3) because their homologs are absent from the inferred proteome.
Next, we asked whether the 26 conventional nucleus-encoded components indeed co-purify with CI. For that, we performed a quantitative analysis of the MS-identified proteins contained in the three BN-PAGE bands. Two metrics were employed: the abundance determined by normalized peptide intensities (iBAQ values) and the enrichment in bands compared with the mitochondrial lysate. In the three bands, all but one nucleus-encoded CI subunits were highly abundant and enriched ( Fig. 5 and Table S2). The only exception was Diplonema's NDUFAB1 homolog, which was less enriched probably because it is also part of other mitochondrial complexes (for details, see "Discussion").
Each of the BN-PAGE bands also contained 15 additional nucleus-encoded proteins with a similar enrichment profile as bona fide CI subunits but without significant sequence similarity to known subunits ( Fig. 5 and Fig. S6). Four of these proteins contain domains known to be involved in oxidoreductive processes, such as FAD and NAD binding, superoxide dismutase, and adrenodoxin, and five other proteins are predicted to have transmembrane helices (Table 3). These 15 novel proteins, which are conserved across the four diplonemid species, probably represent diplonemid-specific CI subunits.
The set of conventional nucleus-encoded CI subunits together with the 10 mtDNA-encoded Nad proteins, and the putative 15 novel components (in total ϳ425 kDa) add up to a cumulative size of ϳ1.6 MDa, which is compatible with the migration behavior of the medium BN-PAGE band displaying NADH-dehydrogenase activity (Fig. 3).

Respirasome of D. papillatum
In addition to the 36 assigned conventional and 15 putative novel CI subunits, the large BN-PAGE band also included the complete subunit set of Complex III (9 proteins, ϳ268 kDa) and Complex IV (10 proteins, ϳ304 kDa) and thus most likely represents the respiratory supercomplex SCI:III:IV, also referred to as respirasome (Fig. 5, Fig. S6, and Table S2). Assuming the usual respirasome stoichiometry (1:2:1), the cumulative predicted size of all proteins is ϳ2.5 MDa, which is in good agreement with the inferred size of the large BN-PAGE complex ( Fig. 3 and Table 4).
The large BN-PAGE band also contained 24 additional proteins, which were below the abundance and/or enrichment inclusion threshold in at least one of the other two complexes (Fig. 5). Only three of these proteins had identifiable domains (phosphoribosyl transferase, aldehyde dehydrogenase, and prohibitin), and eight proteins were predicted to contain TMHs. Further experiments will be needed to determine whether they are respiratory chain components.

Diplonemid mitochondrial Y genes code for highly derived Complex I subunits
Combined experimental and computational approaches revealed that the Y proteins of diplonemids are mitochondrionencoded CI subunits. Several of these subunits (e.g. Nad9 and Nad2) in diplonemids are even more derived than their kinetoplastid counterparts, which existed unassigned in the literature for more than 20 years under the acronyms MURF and CR (e.g. see Duarte et al. (44) and Opperdoes et al. (48)).
The detection of Y (and other Nad) proteins by MS further demonstrates that trans-spliced and edited mRNAs are indeed translated, because certain observed peptides cover module junctions and post-transcriptionally added U-tracts (Fig. S5). Particularly remarkable is the faithful translation of Ͼ10-ntlong U tracts. Although homopolymeric stretches in mRNAs are known to induce nucleotide skipping or back-slipping of the translational machinery (see references in Burger et al. (49)), we  Table S3. c Subunit type: C, core; AB, accessory, broadly distributed; AOO, accessory outside opisthokonts; AK, accessory in kinetoplastids. d Experimental evidence for cleavage of a mitochondrial targeting peptide (MTP) and the MTP size. ϩ, MTP and its cleavage site predicted and consistent with MS peptide data (an asterisk indicates weaker evidence due to sparse coverage at the protein's N terminus). Ϫ, MS evidence that no oligopeptide is removed from the protein's N-terminus. ?, predicted MTP and cleavage site inconsistent with MS peptide data coverage. NA, not applicable (mtDNA-encoded protein). e Number of unique peptides and sequence coverage were calculated using MaxQuant (for details, see supporting Experimental procedures). For data on the small and medium BN-PAGE complexes and data analysis using Mascot, see Table S3. f Detected in Mascot analyses, but not by MaxQuant.

Respiratory chain Complex I of diplonemids
did not observe any indication of such ribosomal frameshifting in D. papillatum.

Conspicuous low-complexity regions in the newly assigned Nad proteins of diplonemids
The most extreme case of U-appendage RNA editing occurs in nad6 (y5). In all examined diplonemids, the nad6 mRNA contains 50 post-transcriptionally added uridines specifying a 16-residue-long Phe tract. Single-amino acid (homopeptide) repeats such as these Phe tracts form low-complexity regions in the corresponding protein. Reported low-complexity region proteins, which are quite common across eukaryotes and bacteria, are made up predominantly of Ser, Gly, Ala, Asn, and Gln residues, whereas Phe repeats are among the rarest (50,51). Due to their bulky and hydrophobic nature, Phe residues (and also Leu, Ile, and Val) occur mostly in TMHs (52,53). Accordingly, computational protein-structure analyses predict that the Phe tracts of diplonemid Nad2 (Y3) and Nad6 (Y5) (as well as Nad1 and Nad5 (38)) are part of TMHs that are superimposable with TMHs from homologs whose 3D structure is known ( Fig. 2 and Fig. S1). The only exception is Nad9 (Y2), which in all diplonemids contains six consecutive Phe residues that are not predicted to form a transmembrane helix. Structural alignments to Nad9 counterparts with resolved 3D structures indicate that this Phe tract corresponds to a segment that is buried deep inside the protein bundle of the Q module.

Structural consequences of divergent mitochondrion-encoded subunits
Several mitochondrion-encoded components of diplonemid CI lack TMHs that are otherwise present in proteins from most other eukaryotic groups, including kinetoplastids. For example, instead of the canonical 14 TMHs, Nad2 (Y3) of diplonemids is predicted to contain only 11; the three N-terminal helices are absent, as in mammalian Nad2 (Figs. S1 and S2). Because in the 3D structures of mammalian CI, these missing segments are not structurally compensated for by other proteins (54, 55), they appear to be dispensable.
Structural deviations are also observed in diplonemid Nad4L (Y6) and Nad6 (Y5), in which the otherwise highly conserved most N-terminal TMH is missing or severely truncated. (The same appears to apply to their kinetoplastid counterparts (Fig.  S1)). In all determined CI structures, the N-terminal TMHs of Nad4L and Nad6 are in physical contact with one another    Table S2. B, enrichment-abundance plots highlighting only protein classes of interest observed in the trypsin digestion of the large band. Note that the CIII and CIV outliers are the mtDNA-encoded proteins Cob (CIII) and Cox1 and Cox2 (CIV); as all three are within the thresholds in the trypsin ϩ chymotrypsin digestion of the large band (see also Table S2), their levels in the trypsin digestion are an artifact, likely due to their physicochemical properties.  (10,54,55). Therefore, the loss/truncation of both helices in diplonemids may be the result of protein co-evolution. It remains to be determined how or if at all in diplonemids this intersubunit interaction has been replaced (see also supporting Discussion).

Functional consequences of divergent mitochondrionencoded subunits
The majority of diplonemid Y proteins are components of the Pp module (i.e. the membrane part of CI close to the ubiquinone-binding pocket). Nad3 (Y1), Nad4L (Y6), and Nad6 (Y5), together with Nad1, are thought to constitute the proton E-channel (10, 55), whereas Nad2 (Y3) is an antiporter-type subunit forming a proton channel on its own.
In Nad2 (Y3) of diplonemids, TMH5 contains a mid-helix tyrosine (Tyr-TMH5) instead of the canonical Lys-TMH5 (TMH8) that has been proposed to play a crucial role in the hydration of the proton channel (56). (The helix numbering refers to the mammalian Nad2 protein with bacterial/canonical numbering in parentheses.) Apparently, any polar residue is suited for this hydration, allowing the substitution of the conserved lysine by tyrosine, not only in diplonemids, but also in kinetoplastids and ciliates ( Fig. 2A). Further, diplonemid Nad2 lacks the highly conserved mid-helix Glu-TMH2 (TMH5), a residue that is considered to transmit the electrostatic signal during the proton-pumping cycle (see Di Luca et al. (56) and references therein). We propose that a diplonemid-specific and invariable Asp in TMH3 functionally replaced the canonical Glu-TMH2, because the former occupies a position that is just opposite of the latter in the 3D structure model of Nad2 (Fig. S7).
Conventional mitochondrial Nad3 is characterized by the quasi-universally conserved motif (Y/F)ECGF in the disordered loop that is located between TMH1 and TMH2, whereas in diplonemids, the corresponding motif is Y(D/E)AG(I/V) (Fig.  2B). The central Cys residue (which, in mitochondria, is only rarely substituted by Ser as, for example, in many bacteria), has been implicated in the switching of Complex I between the active and de-active state and was proposed to be a site of redox regulation (57,58). CI of diplonemids might have lost this capacity, or, alternatively, employs a different redox sensor.
Diplonemid Nad6 (Y5), the least conserved of all Y proteins, was recognized by a Nad6-like distribution of polar and apolar residues and a segment in TMH3 that usually contains a midhelix Tyr. This residue is considered to be involved in the proton E-channel hydration (10,55) or to participate in the conformational changes of CI during its catalytic cycle (42). In D. papillatum (and F. neradi), a Phe residue occupies this position. The replacement of the polar aromatic Tyr by the nonpolar Phe likely does not affect the hydration of the E-channel as a whole, given that most other functionally important residues of this channel (4) are well-conserved across diplonemids. Another possibility is that a patch of hydrophilic residues occurring in the preceding helix of diplonemid Nad6 compensates for the absence of Tyr-TMH3 (Fig. S2). Alternatively, it may be the aromatic character of the residue that is more critical due to its role in conformational changes (42).

Composition of Complex I from D. papillatum
As compiled in Table 2, CI from D. papillatum (and by inference also that of the other examined diplonemids) consists of 36 assigned components: the universal set of 14 core subunits, 17 broadly distributed eukaryotic accessory subunits, four accessory subunits apparently restricted to Euglenozoa and Heterolobosea, and one other accessory subunit thus far identified only in CI of Trypanosoma (i.e. kinetoplastid ϩ diplonemid-specific; Fig. 4). In addition, we identified a set of 15 proteins that are putative diplonemid-specific CI components ( Table 3). Some of these novel proteins may functionally substitute for accessory subunits missing from Diplonema but otherwise broadly conserved, notably those that are transmembrane proteins of the Pp and Pd modules of CI ( Fig. 4 and Table  S3).
Given these 51 proteins, the predicted mass of CI from Diplonema is ϳ1.6 MDa, which is similar to that from other euglenozoans, but much larger than that from animals, plants, and fungi (Table 4). The latter taxa not only contain fewer lineage-specific components (Fig. 4), but many of their conventional subunits are much shorter (Table 5).
Several proteins associated with diplonemid CI are worth discussing in more detail. The first two are m.3369 (CA) and m.27345 (CAL; Table 2), members of the ␥CA family. In plants

Respiratory chain Complex I of diplonemids
(but not opisthokonts (21,24)), these proteins are important and integral CI components (28,59), and they have also been detected in this complex of algae, kinetoplastids, euglenids, and amoebozoans ( Fig. 4) (18,20,22,60). In plants, ␥CAs have been proposed to interconvert CO 2 , water, and their bicarbonate ion and to play a role in the transport of carbon dioxide and bicarbonate across membranes. However, a specific functional connection between this enzyme class and CI is unclear. Moreover, in many eukaryotes, most of the residues proposed to be essential for the ␥CA catalysis have been substituted (24); this is also the case for the two diplonemid proteins. While in Diplonema CI bands we detected two ␥CA family members, the nuclear genome encodes in total five paralogs, each with a predicted mitochondrial targeting peptide (data not shown). As suggested for the five ␥CA and ␥CA-like proteins in Arabidopsis (26), the paralogs from Diplonema may form heterooligomers of different composition under various growth conditions or life cycle stages.
Second, the NDUFAB1 homolog (m.4241; mitochondrial acyl-carrier protein) of Diplonema is probably a multifunctional protein with copies occurring in several different protein complexes, as observed in other eukaryotes. Specifically, the counterpart in yeast and humans not only serves as a structural scaffold in CI, where it binds the LYR family subunits NDUFA6 and NDUFB9 (61)(62)(63), but also is part of the assembly machinery of other respiratory complexes and further acts as the multifunctional mitochondrial acyl carrier protein in mitochondrial fatty acid synthesis (64). The multiple locations of NDUFAB1 could explain the here-reported moderate enrichment of this protein in Diplonema CI preparations compared with whole mitochondrial lysate (Fig. 5).
NDUTB2/3 and NDUTB17, which occur in CI only in trypanosomes (20) and diplonemids ( Table 2 and Fig. 4), have significant sequence similarity to 2-enoyl-thioester reductase and acyl-CoA synthetase, respectively. These three integral complex subunits could assure a physical connection between CI and the fatty acid degradation pathway by channeling electrons harvested from NADPH to CI.
Finally, one of the putative diplonemid-specific accessory CI components (m.2267) is an iron-dependent superoxide dismutase homolog (Fe/Mn-SOD-type; Table 3). SOD enzymes transform reactive oxygen species, generated by mitochondrial electron transport, to hydrogen peroxide. Although SOD is present in nearly all mitochondria, its specific localization is not known except in Caenorhabditis elegans, where SOD proteins have been shown to be associated with the respiratory chain supercomplex I:III:IV (65). Interestingly, CI of Trypanosoma includes two paralogs of an unrelated enzyme with the same catalytic function (Fe-SOD; NDUTB10 and -11) (20).

Organization of Diplonema's respiratory complexes into a respirasome
The largest BN-PAGE band displaying NADH-dehydrogenase activity that we isolated from D. papillatum (Fig. 3) contained not only subunits of CI, but also of CIII and CIV, present in abundance and enrichment comparable with those of CI (Fig.  5, Fig. S6, and Table S2). This result indicates that the large band represents Diplonema's respirasome (i.e. the supercomplex I:III:IV). The supercomplex from Diplonema likely has the identical complex ratio of 1:2:1 as the predominant form in animals and fungi (28), in contrast to the plant supercomplex, which consists only of CI and CIII (SCI 1 III 2 ) (28,66).
Notably, Diplonema's SCI:III:IV is unusually stable, even in preparations in which relatively strong detergents have been employed, n-dodecyl-␤-D-maltoside (DDM) or Triton X-100 instead of digitonin, the latter being required for supercomplex isolation in most model organisms (e.g. see Schägger and Pfeiffer (67) ; Fig. 3). Therefore, the respirasome of Diplonema is a promising candidate for 3D structure determination. Specifically, it would be interesting to validate the above proposed structural deviations of CI subunits from D. papillatum, in particular the loss of a TMH in both Nad6 and Nad4L, the resulting unconventional helix packing of these components, and the remodeled protein-protein interaction between Nad6 and Nad4L.

Conclusions
Because the bacterial CI seems to work well with the 14 core subunits alone, a debate has ensued about the role of the numerous accessory subunits in the mitochondrial complex. Hypotheses range from catalysis optimization (15) to stability (68) to assembly (63). In fact, the issue of the raison-d'être of supernumerary components applies to all molecular machineries of the eukaryotic cell that are of bacterial origin. The shortcoming of the hypotheses noted above is their presumption that the only evolutionary force is natural selection, ignoring the possibility that random genetic drift (e.g. constructive neutral evolution (69 -71)) may equally well lead to complexity.
A connected question is why CI contains enzymes whose functions are unrelated to electron transport and proton pumping. Again, CI is not the only respiratory complex with functionally unconnected protein associates. One among many examples is Complex III, to which the mitochondrial protein peptidase is tethered (e.g. see Mach et al. (72) and references therein). In our view, the question is not necessarily appropriate, because (i) some of the so-called supernumerary proteins

Respiratory chain Complex I of diplonemids
may not be additions to a particular complex but rather to the respirasome, which serves as a physical anchor, and (ii) these proteins may just happen to co-purify with one or the other respiratory complex, depending on the strength of particular protein-protein interactions within the respirasome.

Isolation of mitochondria
Cells were grown until late exponential phase; harvested by centrifugation; resuspended in a buffer containing 1.2 M sorbitol, 20 mM HEPES, pH 7.5, 2.5 mM EDTA, pH 8.0, and 1ϫ cOmplete EDTA-free protease inhibitors (Roche Applied Science); and then lysed in a nitrogen cavitation chamber (Parr Instrument Co.) under 30-bar nitrogen pressure. The cell lysate was ultracentrifuged on a two-step sucrose gradient (36 and 60%), and the fraction enriched in mitochondria was collected from the 36/60% sucrose interface. A detailed protocol is available at https://doi.org/10.17504/protocols.io.fbqbimw. 4

Preparation of mitochondrial lysates, native PAGE, and in-gel activity staining
Liquid nitrogen-cooled mitochondrial pellets were pulverized using TissueLyserII (Qiagen) in the lysis buffer (20 mM Tris-HCl, pH 7.5, 40 mM KCl, 3 mM MgCl 2 , 2.5 mM DTT, and 1ϫ cOmplete EDTA-Free), and the grindate was stored at Ϫ80°C. To obtain a membrane-enriched fraction, a lysate aliquot corresponding to 800 g of proteins was centrifuged, and the pelleted membranes were solubilized with DDM in the lysis buffer or in the loading buffer (50 mM BisTris, pH 7.0, 50 mM NaCl, 2 mM 6-aminohexanoic acid, 1ϫ cOmplete EDTA-Free). Cleared detergent-solubilized lysates were mixed with Coomassie Brilliant Blue (CBB) G-250 for BN-PAGE, and the electrophoretic separation of protein complexes was performed according to published protocols (76 -78) with minor modifications (see supporting Experimental procedures). Complex I was detected within the polyacrylamide gel by adding NADH and staining with nitro blue tetrazolium, and Complex V by adding ATP and Pb(NO 3 ) 2 , essentially as described (79).

Sample preparation for MS and protein identification and quantification
Samples of whole mitochondria for tandem mass spectrometry (MS/MS) were prepared by two approaches (for details, see supporting Experimental procedures). Briefly, in the first approach, thawed mitochondria were heat-denatured in the presence of 2% SDS and concentrated by electrophoresis in a Tris-glycine SDS-polyacrylamide stacking gel. The band was cut out, and proteins were fixed by methanol and acetic acid before submission for MS. In the second approach, the mito-chondrial grindate was denatured essentially as above, but to remove the detergent, the sample was then diluted to 0.05% SDS in 6 M urea buffered with 100 mM Tris, pH 8.5, and concentrated using an Amicon Ultra 10,000 molecular weight cutoff device, essentially following the FASP protocol (80). For the analyses of protein complexes, samples were electrophoretically separated and then excised from CBB-stained gels.
Proteomics sample processing and analyses, including enzymatic digestion (trypsin or trypsin and chymotrypsin), MS, and peptide searches using Mascot in Proteome Discoverer (Matrix Science), were outsourced to the Proteomics Discovery Platform at the Institut de Recherches Cliniques de Montréal (IRCM) and to the Center for Advanced Proteomics Analyses at the Institut de Recherche en Immunologie et en Cancérologie (IRIC) in Montreal. Spectra were also searched in-house with MaxQuant version 1.6.1.0 following published protocols (81,82). We used a custom database of D. papillatum proteins (based on the ORFs predicted from mitochondrial and nuclear transcripts). Depending on the particular protein sample (see Table S1), either trypsin or trypsin and chymotrypsin were specified as the digestion enzyme(s), allowing up to two and three missed cleavage sites, respectively.
Proteins were quantified by calculating PAI and iBAQ, essentially as devised by Rappsilber et al. (46) and Schwanhäusser et al. (45), respectively. To identify proteins enriched in BN-PAGE bands displaying NADH-dehydrogenase activity, we compared the normalized protein abundances in bands with those in mitochondrial lysates. For further details on protein identification and quantification, see supporting Experimental procedures.

Protein sequence analyses
Assignment of Y proteins was performed by searching for their homologs among the mtDNA-encoded CI proteins, previously unidentified in diplonemids, from a wide variety of eukaryotes. Published protein sequences were first clustered using CD-HIT (83), and from among the cluster representatives, ϳ100 reliable sequences per protein were manually selected. For each protein, selected RefSeq sequences were aligned using MUSCLE (84) and MAFFT (85), and HHsuite (86) was then used to generate profile HMMs and to compare each profile with profile HMMs of Y proteins. The best scoring hit for each Y protein was selected for multiple-sequence alignment. For additional assignment criteria used, see supporting Results and supporting Experimental procedures. A phylogenetic tree was constructed with PhyloBayes version 4.1c (87), essentially as described previously (38).
For the identification of nucleus-encoded subunits, the custom database of nucleus-encoded D. papillatum proteins was searched by BLAST (92) with a collection of previously vali-