Target enrichment of conserved nuclear loci has helped reconstruct evolutionary relationships among a wide variety of species. While there are preexisting bait sets to enrich a few hundred loci across all fishes or a thousand loci from acanthomorph fishes, no bait set exists to enrich large numbers (>1,000 loci) of ultraconserved nuclear loci from ostariophysans, the second largest actinopterygian superorder. In this study, we describe how we designed a bait set to enrich 2,708 ultraconserved nuclear loci from ostariophysan fishes by combining an existing genome assembly with low coverage sequence data collected from two ostariophysan lineages. We perform a series of enrichment experiments using this bait set across the ostariophysan tree of life, from the deepest splits among the major groups (>150 Ma) to more recent divergence events that have occurred during the last 50 million years. Our results demonstrate that the bait set we designed is useful for addressing phylogenetic questions from the origin of crown ostariophysans to more recent divergence events, and our in silico results suggest that this bait set may be useful for addressing evolutionary questions in closely related groups of fishes, like Clupeiformes.
TARGET enrichment of highly conserved, phylogenetically informative loci (Faircloth et al., 2012) has helped researchers reconstruct and study the evolutionary history of organismal groups ranging from cnidarians and arthropods to vertebrate clades such as birds and snakes (Moyle et al., 2016; Streicher and Wiens, 2016; Branstetter et al., 2017; Quattrini et al., 2018). Among fishes, researchers have designed enrichment bait sets that can collect data from hundreds of loci shared among a majority of ray-finned fishes (Actinopterygii; Faircloth et al., 2013) or more than one thousand loci shared among actinopterygian subclades (Alfaro et al., 2018) like the group of spiny-finned fishes that dominates the world's oceans (Acanthomorpha; 19,244 species). The scale of data collection enabled by these approaches is unprecedented—a single researcher can collect sequence data from hundreds or thousands of loci across hundreds of taxa in a matter of weeks. The genome-wide distribution of these hundreds or thousands of loci can then be leveraged to: resolve relationships that were previously intractable (Alfaro et al., 2018), redefine our knowledge of the tempo of evolutionary change (Harrington et al., 2016), and help understand why relationships in some fish groups are so difficult to reconstruct (Alda et al., 2019).
Although bait sets have been designed to work broadly across actinopterygians and more specifically within acanthomorphs, no target enrichment bait set exists that is tailored to collect sequence data from conserved loci shared by ostariophysan fishes, which constitute the second largest actinopterygian superorder (Ostariophysi; 10,887 species). This ostariophysan radiation (Fig. 1) has produced the majority (∼70%) of the world's freshwater fishes and includes catfishes, the milkfish, tetras, minnows, electric knifefishes, and their allies. The evolutionary success of ostariophysans may stem from a shared derived possession of an alarm substance called Schreckstoff (von Frisch, 1938) and/or a remarkable modification of the anterior vertebral column known as the Weberian apparatus (Weber, 1820; Rosen and Greenwood, 1970), which enhances hearing by transmitting sound vibrations from the swim bladder to the inner ear. Morphological (Rosen and Greenwood, 1970; Fink and Fink, 1981, 1996) and molecular studies (Dimmick and Larson, 1996; Saitoh et al., 2003; Nakatani et al., 2011; Betancur-R et al., 2013; Arcila et al., 2017; Chakrabarty et al., 2017) have demonstrated monophyly of the clade and provided numerous hypotheses of relationships among the five ostariophysan orders (reviewed in Arcila et al.  and Chakrabarty et al. ). Because several of these phylogenetic hypotheses disagree substantially, major questions about ostariophysan evolution remain unresolved. For example, some studies suggest that Siluriformes (catfishes) and Gymnotiformes (electric knifefishes) are not each other's closest relatives (Nakatani et al., 2011; Dai et al., 2018), which would imply that the electroreceptive capacities of these two orders evolved independently. Other studies have suggested the non-monophyly of the Characiformes (Chakrabarty et al., 2017), which implies a more complicated pattern of evolution in the morphology and development of oral dentition and other anatomical systems in this group, as well as suggesting an alternative biogeographical hypothesis to the classical Gondwanan vicariance model (Lundberg, 1993; Sanmartín and Ronquist, 2004). A similar debate concerns the composition of the immediate outgroups to Ostariophysi (see discussion in Lavoué et al., 2014), which involve the enigmatic marine family Alepocephalidae (slickheads), as well as the world's diverse radiation of Clupeiformes (herrings and anchovies), a taxonomic order long allied to Ostariophysi on the basis of anatomical and molecular evidence (Lecointre, 1995).
Though molecular and morphological hypotheses of interfamilial and intergeneric relationships have been advanced within each of the five ostariophysan orders, substantial work remains before our understanding of the evolutionary history of ostariophysans will rival that of the best studied acanthomorph groups, such as cichlids (Brawand et al., 2014; Malinsky et al., 2018). The majority of previous work among ostariophysans has involved parsimony analysis of osteological characters or model-based analysis of multilocus Sanger datasets, with even the largest molecular studies (e.g., Schönhuth et al., 2018) including fewer than 15% of the species diversity in the targeted clades. At the genome scale, ostariophysans have been included in studies sampling across the diversity of ray-finned fishes (e.g., Faircloth et al., 2013; Hughes et al., 2018), while studies focusing on Ostariophysi have only recently begun to appear (Arcila et al., 2017; Chakrabarty et al., 2017; Dai et al., 2018). However, these genome-scale projects have sampled fewer than 1% of total ostariophysan species diversity and have only begun to address questions about the relationships among families or genera. A robust and well-documented approach to collect a large number of nuclear loci across ostariophysan orders and appropriate outgroups will accelerate our ability to conduct taxon-rich studies of phylogenetic relationships within and across the group and allow us to synthesize these data into a more complete and modern picture of ostariophysan evolution than previously possible.
Here, we describe the design of an enrichment bait set that targets 2,708 conserved, nuclear loci shared among ostariophysan fishes, and we empirically demonstrate how sequence data collected using this bait set can resolve phylogenetic relationships at several levels of divergence across the ostariophysan tree of life, from the deepest splits among ostariophysan orders and their outgroup (Otocephala, crown age 210–178 megaannum [Ma]; Hughes et al., 2018) to more recent divergence events among lineages comprising the Gymnotiformes (crown age 86–43 Ma) or Anostomoidea (crown age within 76–51 Ma; Hughes et al., 2018). An earlier study (Arcila et al., 2017) developed a bait set targeting 1,068 exon loci shared among otophysans, one of the ostariophysan subclades that includes Characiformes, Cypriniformes, Gymnotiformes, and Siluriformes (Fig. 1). The bait set that we describe differs from that of Arcila et al. (2017) by targeting a larger number of loci that includes coding and non-coding regions shared among a larger and earlier diverging clade (i.e., ostariophysans and their proximate outgroups). As with most bait sets targeting conserved loci shared among related groups, the designs are generally complementary rather than incompatible, and researchers can easily combine loci targeted by both designs to accomplish their research objectives.
MATERIALS AND METHODS
Conserved element identification and bait design
To identify conserved elements shared among the ostariophysans, we followed the general workflow described in Faircloth (2017). Specifically, we generated low coverage, whole genome sequencing data from Apteronotus albifrons and Corydoras paleatus, and we aligned these low-coverage, raw reads to the genome assembly of D. rerio (hereafter danRer7; NCBI GCA_000002035.2) using stampy v1.0.21 with the substitution rate set to 0.05. We used a substitution rate of 0.05 because previous experience suggested this value allows reads to map to parts of the genome that can be captured consistently using 120 bp enrichment baits while simultaneously reducing the number of read mappings to potentially paralogous regions. After read mapping, we followed the procedure outlined in Faircloth (2017) to identify conserved loci and design baits to enrich these loci. Full details of the locus identification and bait design approach we used are provided in the Supplemental Information (see Data Accessibility).
Empirical sequence data collection overview
To test the utility of the resulting bait set for ostariophysan phylogenetics, we designed several experiments that spanned the breadth of species diversity (Table 1) and divergence times in this group. Different research groups performed target captures spanning a range of subclade ages from young (<50 Ma) to old (∼200 Ma): Gymnotiformes (crown age 83–46 Ma; Hughes et al., 2018), Anostomoidea (a characiform subclade that includes headstanders and detritivorous characiforms; crown age falls within 76–51 Ma; Hughes et al., 2018), Loricarioidei (armored catfishes; crown age 116–131 Ma; Rivera-Rivera and Montoya-Burgos, 2017), and the Characiformes sensu lato (tetras and allies; crown age 133–112 Ma; Hughes et al., 2018). We then combined data from several species within each group with additional enrichments from outgroup lineages and conserved loci harvested from available genome sequences to create a dataset spanning Otocephala, a diverse teleostean clade that includes ostariophysans and clupeomorphs (sardines, herrings and allies; crown age 210–178 Ma). Specific details regarding the laboratory methods for each experiment can be found in the Supplemental Information (see Data Accessibility).
Sequence data quality control and assembly
After sequencing, we received FASTQ data from each sequencing provider, and we removed adapters and trimmed the sequence data for low quality bases using illumiprocessor (https://illumiprocessor.readthedocs.io/) which is a wrapper around Trimmomatic (Bolger et al., 2014). We assembled trimmed reads using a phyluce wrapper around the Trinity assembly program (Grabherr et al., 2011). Before creating datasets for phylogenetic processing, we integrated the sequence data collected in vitro with those collected in silico.
In silico sequence data collection
We used computational approaches to extract data from 11 fish genome assemblies available from UCSC, NCBI, and other sites (Table 1). We identified and extracted UCE loci that matched the ostariophysan bait set using phyluce and a standardized workflow (Faircloth, 2015), except that we adjusted the sequence coverage value to 67% and the sequence identity parameter to 80%. We used these values because they tend to produce a slightly more complete set of loci for downstream filtering using the phyluce workflow for phylogenetic analysis. After locus identification, we sliced UCE loci ± 500 bp from each genome and output those slices into FASTA files identical to the FASTA files generated from assemblies of the samples we processed in vitro. Once we harvested the in silico data, we merged these with the in vitro data and processed both simultaneously.
UCE identification, alignment, and phylogenetic analyses
We used a standard workflow (https://phyluce.readthedocs.io/en/latest/tutorial-one.html) and programs within phyluce to identify and filter non-duplicate contigs representing conserved loci enriched by the ostariophysan bait set (hereafter UCEs). Then, we used lists of taxa to create one dataset for each taxonomic group outlined in Table 1, and we extracted FASTA data from the UCE contigs enriched for group members. We exploded these data files by taxon to compute summary metrics for UCE contigs, and we used phyluce to generate mafft v.7 (Katoh and Standley, 2013) alignments of all loci. We trimmed alignments using trimAL (Capella-Gutierrez et al., 2009) and the ‘-automated1′ routine, and we computed alignment statistics using phyluce. We then generated 75% complete data matrices for all datasets, and we computed summary statistics across each 75% complete matrix. We concatenated alignments using phyluce, and we conducted maximum likelihood (ML) tree and bootstrap replicate searches with the GTRGAMMA site rate substitution model using RAxML (v8.0.19). We used the ‘-autoMRE' function of RAxML to automatically determine the bootstrap replicate stopping point. Following best and bootstrap ML tree searches, we added bootstrap support values to each tree using RAxML. We did not test different data partitioning strategies (e.g., Tagliacollo and Lanfear, 2018) or run Bayesian or coalescent-based analyses because we were interested in determining whether this bait set produced reasonable results at the levels of divergence examined rather than exhaustively analyzing the evolutionary relationships among the taxa included.
Computing overlap between bait sets
Several recent studies have detailed similar bait sets for the targeted enrichment of UCE loci—a general bait set targeting 500 UCE loci shared among actinopterygian lineages (Faircloth et al., 2013) and a more specific bait set targeting 1,314 UCE loci shared among acanthomorph lineages (Alfaro et al., 2018). To demonstrate the differences and similarities between the bait sets targeting UCE loci described in these earlier studies and the ostariophysan UCE loci and bait set described as part of this study, we computed the intersection of bait sets across several genome-enabled actinopterygian taxa that represent major lineages within the group: Danio rerio, Lepisosteus oculatus, Oryzias latipes, and Scleropages formosus. We selected these specific taxa because each had reasonably well-assembled genome sequences, and because two of the four (Danio rerio and Oryzias latipes) were used to design baits in each of the sets we compared. To compute these intersections, we followed the standard protocol for identifying UCE loci from genome assemblies using phyluce mentioned above (https://phyluce.readthedocs.io/en/latest/tutorial-three.html). Then, we sliced UCE loci from each genome sequence including 25 base pairs to each side of the match location. We converted the resulting FASTA files to BED (Browser Extensible Data) format using a utility script from phyluce, and we used a combination of BEDTools (intersect) and GNU coreutils v8.4 (comm, uniq, and wc) to count the number of shared overlaps among different bait sets, using the ostariophysan bait set described herein as the reference set of UCE loci. We plotted overlaps as Venn diagrams for each taxon using Adobe Illustrator (v23.0.4).
We collected an average of 3.47 M reads from enriched libraries (Supplemental Table 1; see Data Accessibility), and we assembled these reads into an average of 18,048 contigs having a mean length of 440 bp (Supplemental Table 2; see Data Accessibility). After searching for enriched, conserved loci among the contig assemblies, we identified an average of 1,446 targeted, conserved loci per library (range 525–1882; Supplemental Table 3; see Data Accessibility) having a mean length of 666 bp per locus. From these loci, we created five different datasets (Table 1) that spanned the diversity of relationships within ostariophysans and extended beyond this clade to include Clupeiformes and other distantly related lineages (the otocephalan dataset). We describe specific results from each of these datasets below.
The gymnotiform dataset (Table 1) was one of two “young” ostariophysan subclades we studied (crown age 83–46 Ma; Hughes et al., 2018). We enriched an average of 1,871 UCE loci from members of this group that averaged 591 bp in length and represented 2,259 of 2,708 loci (83%) that we targeted (Supplemental Table 3; see Data Accessibility). Alignments generated from these loci contained an average of seven taxa (range 3–9). After alignment trimming, the 75% matrix contained 1,771 UCE loci that included an average of eight taxa (range 6–9). Each locus had an average trimmed length of 466 bp and an average of 62 parsimony informative sites. We joined these loci into a concatenated alignment file with a total length of 825,574 characters and 110,098 parsimony informative sites. RAxML bootstrap analyses required 50 iterations to reach the MRE stopping point, and we present the best ML tree with bootstrap support values in Figure 2.
The anostomoid dataset (Table 1) was the second of two “young” ostariophysan subclades we studied (crown age falls within 76–51 Ma; Hughes et al., 2018), and we enriched an average of 1,272 UCE loci from members of this group. These UCE loci averaged 493 bp in length and represented 1,987 of the 2,708 loci (73%) that we targeted (Supplemental Table 3; see Data Accessibility). Alignments of these loci contained an average of nine taxa (range 3–15). After alignment trimming, the 75% matrix included 879 UCE loci containing an average of 13 taxa (range 11–15). Each of these loci had an average trimmed length of 487 bp and an average of 68 parsimony informative sites. We joined these loci into a concatenated alignment with a total length of 428,381 characters and 59,928 parsimony informative sites. RAxML bootstrap analyses required 50 iterations to reach the MRE stopping point, and we present the best ML tree with bootstrap support values in Figure 3.
The loricarioid dataset (Table 1) represented an ostariophysan subclade of moderate age (crown age 116–131 Ma; Rivera-Rivera and Montoya-Burgos, 2017). We enriched an average of 1,379 UCE loci from members of this group having an average length of 781 bp and representing 2,176 of the 2,708 loci (80%) we targeted (Supplemental Table 3; see Data Accessibility). Alignments of these loci included an average of nine taxa (range 3–15). After alignment trimming, the 75% matrix comprised 938 UCE loci that included an average of 13 taxa (range 11–15). Each locus had an average trimmed length of 648 bp and an average of 261 parsimony informative sites. We joined these loci into a concatenated alignment file with a total length of 608,044 characters and 244,660 parsimony informative sites. RAxML bootstrap analyses required 50 iterations to reach the MRE stopping criterion, and we present the best ML tree with bootstrap support values in Figure 4.
The characiform dataset (Table 1) represented our second ostariophysan subclade of moderate age (∼122 Ma; Hughes et al., 2018). We enriched an average of 1,701 UCE loci from members of this group having an average length of 784 bp (Supplemental Table 3; see Data Accessibility) and representing 2,493 of the 2,708 loci we targeted (92%). Alignments of these loci included an average of 15 taxa (range 3–22). After alignment trimming, the 75% data matrix comprised 1,399 UCE loci that included an average of 19 taxa (range 16–22). Each locus had an average trimmed length of 577 bp and an average of 220 parsimony informative sites. We joined these loci into a concatenated alignment file with a total length of 807,240 characters and 307,465 parsimony informative sites. RAxML bootstrap analyses required 50 iterations to reach the MRE stopping criterion, and we present the best ML tree with bootstrap support values in Figure 5.
The otocephalan dataset (Table 1) represented the oldest clade of fishes we investigated (∼193 Ma; Hughes et al., 2018), and we created this dataset by combining enrichment data from select lineages used in the datasets above with enrichment data collected using the same array from taxa representing Clupeiformes and Cypriniformes (Table 1). To these empirical data, we integrated in silico data harvested from even more distant outgroups to show that the ostariophysan bait set is useful to study these other groups and also to demonstrate that it recovers reasonable relationships among these various lineages. From the taxa in this dataset on which we performed targeted enrichment, we collected an average of 1,447 UCE loci having an average length of 784 bp. When we combined these data with the in silico data harvested from existing genome sequences, the alignments represented 2,573 of 2,708 loci (95%), each alignment contained a mean of 11 taxa (range 3–21), and average alignment length was 445 bp. After alignment trimming, the 75% data matrix included 658 UCE loci containing an average of 17 taxa (range 15–21), having an average length of 384 characters, a total length of 252,749 characters, and an average of 146 parsimony informative sites per locus. RAxML bootstrap analyses required 350 iterations to reach the MRE stopping criterion, and we present the best ML tree with bootstrap support values in Figure 6.
The bait set that we designed effectively collected data from the majority of the 2,708 UCE loci that we targeted across the four ostariophysan subclades we investigated: averaging across all of our experiments except the otocephalan dataset, which included many genome-enabled taxa, we enriched an average of 2,229 of the 2,708 loci (82%). This bait set also performed well when enriching putatively orthologous loci from Amazonsprattus scintilla (Clupeiformes, 867 loci). Because of our success enriching loci from the Clupeiformes, which are a close outgroup to the Ostariophysi, and despite our lack of a lineage representing the Gonorynchiformes, we refer to this bait set as targeting the Ostariophysi/ostariophysans rather than smaller subclades within this group. In the sections that follow, we discuss the phylogenetic hypotheses we generated for each taxonomic group.
The relationships we recover among the main lineages of Gymnotiformes (Fig. 2) agree with previous studies that used mtDNA genomes (Elbassiouny et al., 2016) or exons (Arcila et al., 2017). Similar to the results in these studies, we resolve Apteronotidae, represented in our dataset by Sternarchorhamphus muelleri, as sister to all remaining groups in the order. This placement of Apteronotidae disagrees with previous morphological and Sanger-based hypotheses which suggested either Gymnotidae (banded knifefishes of the genus Gymnotus and electric eel; Tagliacollo et al., 2016) or only the electric eel Electrophorus (i.e., non-monophyletic Gymnotidae; Janzen, 2016) were the sister group to all the other families.
Our UCE results resolve representatives of the families that produce pulse-type electric organ discharges (Rhamphichthyidae [sand knifefishes] and Hypopomidae [bluntnose knifefishes]) as a monophyletic group, while we resolved families producing electric signals in the form of waves (Apteronotidae [ghost knifefishes] and Sternopygidae [glass and rat-tail knifefishes]) as paraphyletic, a phylogenetic hypothesis that contrasts with previous studies that used morphology or Sanger sequencing data to suggest these families were monophyletic (Albert, 1998, 2001; Albert and Crampton, 2005; Janzen, 2016; Tagliacollo et al., 2016).
The differences we observed among the placement of gymnotiform families relative to previous studies reflects the confusing history of gymnotiform evolution where almost any possible hypothesis of relationships among gymnotiform families has been suggested (Triques, 1993; Gayet et al., 1994; Alves-Gomes et al., 1995; Albert, 1998, 2001; Albert and Crampton, 2005; Janzen, 2016; Tagliacollo et al., 2016; Arcila et al., 2017). These conflicts may arise from a very rapid diversification event that occurred around the origin of the Gymnotiformes which created an evolutionary history muddled by incomplete lineage sorting. The causes of these incongruences and methods to increase consistency in the inferences drawn from UCE data are discussed more completely in Alda et al. (2019).
Our ML analyses (Fig. 3) recover a clear division between the omnivorous/herbivorous Anostomidae (headstanders) and a clade of three fully or partially detritivorous families (Chilodontidae, Curimatidae, and Prochilodontidae), a result also found by earlier, Sanger-based analyses (Melo et al., 2014, 2016, 2018; Burns and Sidlauskas, 2019). Relationships within Anostomidae match the Sanger-based results of Ramirez at al. (2017) and differ from the morphology-based hypothesis of Sidlauskas and Vari (2008) in the placement of Anostomus as sister to Leporellus (rather than Laemolyta). Relationships within Curimatidae are fully congruent with Vari's (1989) morphological hypothesis and a recent multilocus Sanger phylogeny (Melo et al., 2018).
We resolve Prochilodontidae and Chilodontidae as successive sister groups to Curimatidae. These results agree with one recent Sanger-based analysis (Burns and Sidlauskas, 2019) but differ from other recent Sanger sequencing studies (Oliveira et al., 2011; Melo et al., 2018) which reverse this order, and they also differ from Vari's (1983) morphological hypotheses, which suggested Chilodontidae were sister to Anostomidae. Regardless of the exact relationships between Prochilodontidae, Chilodontidae, and Curimatidae, the resolution of branching order among these three primarily detritivorous characiform families is biologically interesting because either resolution implies a different and complex pattern of evolution in oral and pharyngeal dentition, the epibranchial organ, and numerous other anatomical systems. As noted for the Gymnotiformes, the short branches associated with the near simultaneous origin of all three families may explain differences between this study and Sanger-based studies, and future work investigating these relationships would benefit from sampling more broadly across these families and more thorough phylogenetic analyses.
The major relationships we resolve among families in the Loricarioidei (Fig. 4) are congruent with previous morphological hypotheses (Mo, 1991; Lundberg, 1993; de Pinna, 1993, 1996, 1998), an earlier Sanger molecular hypothesis (Sullivan et al., 2006), and the exon-enrichment based molecular hypothesis of Arcila et al. (2017). Interestingly, we resolve the family Scoloplacidae (spiny-dwarf catfishes) and the family Astroblepidae (climbing catfishes) as successive sister groups to the Loricariidae (armored catfishes), a placement reported by other studies (de Pinna, 1998; Sullivan et al., 2006; Roxo et al., 2019) that suggests the loss of armor plating in Astroblepidae (de Pinna, 1998). Because relationships within this group remain controversial (Schaefer, 2003; Sullivan et al., 2006; Rivera-Rivera and Montoya-Burgos, 2017) and because the Loricarioidei is the most diverse suborder of Neotropical catfishes (Sullivan et al., 2006), additional studies of interfamilial relationships, including the placement of the Lithogeninae, and family status within the group are needed.
The overall pattern of relationships we resolved for the Characiformes (Fig. 5) is similar to those from multilocus Sanger sequencing (Oliveira et al., 2011; Burns and Sidlauskas, 2019) or exon-based (Arcila et al., 2017) studies. For example, our results include separation of the African Citharinoidei (Citharinidae and Distichodontidae) from other characiforms in the earliest divergence within the order and resolution of Crenuchidae (Neotropical darters) as sister to all other members of the Characiformes (suborder Characoidei). Within the Characoidei, we resolved two major lineages: one comprising the Ctenoluciidae (pike-characins), Lebiasinidae (pencilfishes), Acestrorhynchidae (dogtooth characins), Bryconidae (dorados and allies), Triportheidae (elongate hatchetfishes), and members of the hyperdiverse family Characidae (tetras) and the other including a monophyletic superfamily Anostomoidea (headstanders, toothless characiforms, and relatives) that is closely aligned to Serrasalmidae (piranhas and pacus), Hemiodontidae (halftooths), Parodontidae (scrapetooths), and more distantly related to Erythrinidae (trahiras) and the second clade of African families Alestidae and Hepsetidae. Within Characoidei, the short branches connecting internodes along the backbone of the phylogeny reflect previous results suggesting a rapid initial diversification of families within this suborder (Arcila et al., 2017; Chakrabarty et al., 2017; Burns and Sidlauskas, 2019).
The branching order we resolve among Lepisosteiformes, Anguilliformes, Osteoglossiformes, and Euteleostei relative to the otocephalan ingroup (Fig. 6) is similar to the pattern of major relationships among these fish groups resolved by other phylogenomic studies (Faircloth et al., 2013; Hughes et al., 2018). Similarly, the UCE data we enriched from lineages representing the Clupeiformes and Cypriniformes produced the same phylogenetic hypothesis for the branching order of these groups relative to the Characiphysi (Characiformes + Gymnotiformes + Siluriformes) as seen in other genome-scale (Hughes et al., 2018) and Sanger sequencing (Near et al., 2012; Betancur-R et al., 2013) studies. Relationships among the orders comprising otophysans are similar to some genome-scale studies and different from others, reflecting the difficulties noted when studying these groups (reviewed in Arcila et al.  and Chakrabarty et al. ; Burns and Sidlauskas, 2019).
Overlaps with other bait sets
After computing the overlaps among target enrichment bait sets designed to capture UCE loci from actinopterygians, acanthomorphs, and ostariophysans, our results demonstrate that a majority of the ostariophysan UCE loci identified as part of this study are different from UCE loci identified as part of previous studies (Fig. 7, Supplemental Table 4; see Data Accessibility). Although many of these loci are new, there remain a core group of approximately 30 loci shared among all of the UCE bait sets previously designed (Supplemental Table 4; see Data Accessibility), suggesting that data from each dataset can be combined using supermatrix approaches.
As detailed above, the data we collected using the ostariophysan bait set reconstruct reasonable phylogenetic hypotheses for all datasets, despite low taxon sampling (less than 1% of diversity for the overall study and less than 5% in Anostomoidea, the most densely sampled subclade). By reasonable, we mean that the phylogenetic hypotheses we resolved largely agree with previous investigations using multilocus Sanger sequencing data or genome-scale data collection approaches. Where we observed differences from some prior studies were those relationships having very short internal branches suggesting rapid or explosive radiation of a particular clade. These areas of treespace are hard to reconstruct (Pamilo and Nei, 1988; Maddison, 1997; Maddison and Knowles, 2006; Oliver, 2013), and many current studies are focused on analytical approaches that produce the most accurate phylogenetic hypothesis given the data. The congruence of our results with stable parts of the trees inferred during these earlier studies and the overall ability of this bait set to pull down significant proportions of the targeted loci suggest that our ostariophysan bait set provides one mechanism to begin large-scale data collection from and inference of the relationships among the more than 10,000 species that comprise the Ostariophysi, many of which have never been placed in a phylogeny.
Future work should explicitly test the effectiveness of this ostariophysan bait set for enriching loci from the Gonorynchiformes, the smallest ostariophysan order and a group for which tissue samples are few. Similarly, this bait set should be tested in the Alepocephaliformes, an enigmatic order of marine fishes that may form a close outgroup to the Ostariophysi. Despite those gaps, our in silico results suggest: (1) that this bait set may be useful in even more distant groups like the Osteoglossiformes or Euteleostei, and (2) the exciting possibility that we may be able to create a large (>1,000–2,000 loci), combined bait set targeting orthologous, conserved loci that are shared among actinopterygians to reconstruct a tree of life spanning the largest vertebrate radiation.
Sequence data from A. albifrons and C. paleatus used for locus identification are available from NCBI BioProject PRJNA493643, and sequence data from enriched libraries using the ostariophysan bait set are available from NCBI BioProject PRJNA492882. The ostariophysan bait design file is available from FigShare (doi: 10.6084/m9.figshare.7144199), where it can be updated, if needed. A static copy of the bait design file and all other associated files, including contig assemblies, UCE loci, and inferred phylogenies are available from Zenodo.org (doi: 10.5281/zenodo.1442082). Raw sequencing reads can be found at the NCBI SRA (SRR7939321–SRR7939322 and SRR10832350–SRR10832402). Supplemental material is available at https://www.copeiajournal.org/cg-18-139.
We thank the curators, staff, and field collectors at the institutions listed in Table 1 for loans of tissue samples used in this project. This work was supported by grants from NSF to B. Faircloth (DEB-1242267), B. Sidlauskas (DEB-1257898), and P. Chakrabarty (DEB-1354149) and FAPESP to C. Oliveira (14/26508-3), B. Melo (16/11313-8), F. Roxo (14/05051-5), and L. Ochoa (14/06853-8). Animal tissues collected as part of this work followed protocols approved by the University of California Los Angeles Institutional Animal Care and Use Committee (Approval 2008-176-21). Portions of this research were conducted with high-performance computing resources provided by Louisiana State University (https://www.hpc.lsu.edu). M. Alfaro, B. Faircloth, and B. Sidlauskas conceived of the idea to design a bait set for ostariophysans. F. Alda, M. Burns, B. Faircloth, K. Hoekzema, and B. Melo collected data; J. Albert, P. Chakrabarty, L. Ochoa, C. Oliveira, and F. Roxo contributed data. B. Faircloth analyzed the data. B. Faircloth wrote the manuscript with substantial assistance from J. Albert, F. Alda, M. Alfaro, P. Chakrabarty, B. Melo, L. Ochoa, C. Oliveira, F. Roxo, and B. Sidlauskas. All authors edited and approved the final manuscript.
Associate Editor: W. L. Smith.