About Notiophilus Duméril , 1806 ( Coleoptera , Carabidae ) : Species delineation and phylogeny using DNA barcodes

The genus Notiophilus Duméril, 1806 is a distinctive taxon of small, diurnal and morphologically similar beetles exhibiting large eyes and widened second elytral intervals. In this study we analysed the effectiveness of DNA barcodes to discriminate 67 specimens that represent 8 species of Notiophilus from Central Europe. Interspecific K2P distances below 2.2% were found for N. biguttatus (Fabricius, 1779) and N. quadripunctatus Dejean, 1826, whereas intraspecific distances with values > 2.2% were revealed for N. rufipes Curtis, 1829. An additional phylogenetic analysis of all available species revealed a close relationship of N. directus Casey, 1920, N. semistriatus Say, 1823, N. simulator Fall, 1906 and N. sylvaticus Dejean, 1831, possibly indicating a radiation of these species in North America. Low support values of most other nodes, however, do not allow additional phylogenetic conclusions.


Introduction
The Carabidae or ground beetles are a huge cosmopolitan family with an estimated number of more than 40,000 species worldwide (Lindroth 1985, Lorenz 2005).As part of its subfamily Nebriinae, the tribe Notiophilini Motschulsky 1850 is a relatively small taxon that includes only one genus: Notiophilus Duméril, 1806.Nevertheless, this genus is one of the more distinctive genera of Carabidae.With a body length of less than 7 mm, species of Notiophilus are small carabids that can be easily recognised by the enormous eyes and furrowed frons, their extremely uniform general habitus with narrow, parallel-sided elytra, as well as by the characteristic dilated second elytral interval ("Spiegelfeld") that can be broad as the 3 rd plus 4 th interval or broader (Fig. 1) (e.g.Lindroth 1961Lindroth -1969)).Many species exhibit wing dimorphism with macropterous (long-winged) and brachypterous (short-winged) morphs (Lindroth 1986, Chapman et al. 2005).Typically, beetles of this genus are diurnal, sun-loving insects and very rapid in their movements.They are visual carabids, preying upon mites, springtails and other small arthropods (e.g.Anderson 1972, Bauer 1981, Ernsting and Mulder 1981, Ernsting et al. 1992).To date, 57 species have been described from the Palearctic, Oriental, Nearctic and Neotropical regions (Barševskis 2011, Bousquet 2012, Löbl and Löbl 2017).For Europe, 14 species are recorded (Barševskis 2007), whereas 9 are known from Germany and Central Europe (Müller-Motzfeld 2006, Trautner et al. 2014).Thanks to the thorough studies of Arvīds Barševskis (Latvia), our knowledge about the biogeography and taxonomy of this genus has increased significantly in recent years (e.g.Barševskis 2001Barševskis , 2007Barševskis , 2009Barševskis , 2011Barševskis , 2012)).Based on the fact that the largest diversity of species and the highest number of endemics are found in Asia, the central part of this continent is hypothesised as the centre of origin of Notiophilus, followed by a subsequent colonisation of Europe, North Africa and North America (Barševskis 2007).In terms of the phylogeny of this genus, however, no analysis has been performed to date.
As noted, species of Notiophilus are remarkably similar in habitus and display a considerable individual variation, making identification difficult (e.g.Lindroth 1961-1969, Hannig 2005, Heijermann and Aukema 2014).Consequently, molecular methods may represent another useful alternative for correct specimen identification.Recently, the analysis of DNA sequence data, in particular the use of an approx.660 base pair (bp) fragment of the mitochondrial cytochrome c oxidase subu nit 1 (COI), has been proposed as the marker of choice, as a so-called "DNA barcode", for specimen identification (Hebert et al. 2003a, Hebert et al. 2003b).DNA barcoding relies on the assumption that the observed interspecific genetic variation exceeds the intraspecific variation to such a proportion that a clear gap exists.As a consequence, unidentified individuals can be assigned correctly to their species (Hebert et al. 2003a, Hebert et al. 2003b).Not surprisingly, DNA barcoding has been criticised from its beginning, for example for the inappropriate use of neighbour-joining trees for analysis or the application of fixed distance thresholds (Will and Rubinoff 2004, Goldstein and DeSalle 2010, Collins and Cruickshank 2013).Nevertheless, nu-merous studies clearly demonstrate the usefulness of DNA barcoding, in particular for insects (e.g.Hausmann et al. 2011, Park et al. 2011, Morinière et al. 2014, Schmidt et al. 2015, Havemann et al. 2018).Thus, the compilation of comprehensive and representative DNA barcode libraries represents an essential step for subsequent studies, for example, biodiversity assessment studies via metabar coding based on modern high-throughput sequencing technologies (e.g.Yu et al. 2012, Cristescu 2014, Brandon-Mong et al. 2015, Porter and Hajibabaei 2018).Despite the high number of described species, however, the number of studies that tested the efficiency of DNA barcodes for species identification of ground beetles is still low (Greenstone et al. 2005, Maddison 2008, Raupach et al. 2010, Woodcock et al. 2013, Pentinsaari et al. 2014, Hendrich et al. 2015, Raupach et al. 2016, Raupach et al. 2018).
As part of our efforts in building a comprehensive DNA barcode library of ground beetles of Germany, we analysed the quality of DNA barcodes to discriminate Central European species of the carabid genus Notiophilus.Furthermore, we reconstructed the phylogeny of this small but charismatic carabid genus for the first time, with a focus on the zoogeographic distribution of the analysed species.

DNA barcode amplification, sequencing and data depository
Laboratory operations were carried out, following standardised protocols for COI amplification and sequencing  (Ivanova et al. 2006, deWaard et al. 2008), at the Ca nadian Center for DNA Barcoding (CCDB), University of Guelph, the molecular labs of the Zoologisches Forschungsmuseum Alexander Koenig in Bonn and/or the working group Systematics and Evolutionary Biology at the Carl von Ossietzky University Oldenburg, Germany.Representative photos from each studied beetle were taken before molecular work was performed.One or two legs of one body side were removed for the subsequent DNA extraction which was performed using NucleoSpin Tissue Kit (Macherey-Nagel, Düren, Germany), following the extraction protocol.
Detailed information about used primers, PCR amplification, and sequencing proto cols can be found in a previous publication (see Raupach et al. 2016).All purified PCR prod ucts were cycle-sequenced and sequenced in both directions at a contract sequencing facili ty (GATC, Konstanz, Germany), using the same primers as used in PCR.Double stranded sequences were assembled and checked for mitochondrial pseudogenes (numts) by analysing the presence of stop codons, frameshifts, as well as double peaks in chromatograms with the Geneious version 8.1.9programme package (Biomatters, Auckland, New Zealand) (Kearse et al. 2012).Routinely, BLAST searches (nBLAST, search set: others, programme selection: megablast) were performed to confirm the identity of all new sequences as ground beetle barcodes, based on already published sequences (high identity values, very low E-values).
Comprehensive voucher information, taxonomic classifications, photos, DNA bar code sequences, primer pairs used and trace files (including their quality) are publicly accessible through the public dataset "DS-BANOT" (Dataset ID: dx.doi.org/10.5883/DS-BANOT) on the Barcode of Life Data Systems (BOLD; www.boldsystems.org)(Ratnasingham and Hebert 2007).All new barcode data have been deposited in GenBank (accession numbers: MK567377-MK567411).

DNA Barcode analysis: Species identification
The analysis tools of the BOLD workbench were employed to calculate the nucleo tide composition of the sequences and distributions of Kimura-2-parameter distances (K2P; Kimura 1980) within and between species (align sequences: BOLD aligner; am biguous base/gap handling: pairwise deletion).All barcode sequences became subject of the Barcode Index Number (BIN) analysis system, implemented in BOLD that clusters DNA barcodes in order to produce operational taxonomic units that typically closely corre spond to species (Ratnasingham and Hebert 2013).A threshold of 2.2% was applied for a rough differentiation between intraspecific and interspecific distances based on Ratnasingham and Hebert (2013).These BIN assignments on BOLD are constantly updated as new sequences are added, splitting and/or merging individual BINs in light of new data (Ratnasingham and Hebert 2013).
In addition, all sequences were aligned using MUS-CLE (Edgar 2004) and analysed using a neighbour-join-ing cluster analysis (NJ; Saitou and Nei 1987) based on K2P distances with MEGA X (Kumar et al. 2018) in order to visualise the DNA barcode divergences and species cluster.As outgroup taxa we used three barcode sequences of Nebria brevicollis (Fabricius, 1792) (accession numbers: KM451780, KM452043, KM452651).Non-parametric bootstrap sup port values were obtained by re-sampling and analysing 1,000 replicates (Felsenstein 1985) implemented in MEGA X.For species pairs with interspecific distances < 2.2%, maximum parsimony networks were constructed with TCS 1.21, based on default settings (Clement et al. 2000) as part of the software package PopART v.1.7 (Leigh and Bryant 2015) after an alignment using MUSCLE (Edgar 2004).Such networks allow the identi fication of possible haplotype sharing between species as a consequence of recent speciation or on-going hybridisation processes.

DNA Barcode analysis: Phylogenetic applicability
As part of our phylogenetic study, we used one representative sequence per analysed species, namely a sequence of the most abundant haplotype.Furthermore, we added sequences of all additional species available at BOLD with a length of at least 500 base pairs (bp), following the same procedure if more than one sequence was giv- were used as outgroup taxa.In total, this dataset consisted of 20 sequences.All sequences were aligned using MUS-CLE with default settings (Edgar 2004).
The accuracy of phylogenetic reconstructions depends on various factors, e.g.sequence quality, the correct identification of homologous sites, the absence of heterotachy or, in particular, substitution saturation (Xia 2009).In the extreme case that sequences have experienced full substitution saturation, the given similarity between the sequences will depend entirely on the similarity in nucleotide frequencies and often do not reflect their phylogenetic relationships (e.g.Steel et al. 1993, Xia et al. 2003).As a consequence, fast evolving protein coding genes, such as COI, cannot be used for phylogenetic analysis that focus on deep and old branches (e.g.Wetzer 2002, Goetze 2003, Maddison et al. 2014), but can be useful for the study of more recent phylogenetic events on species level (e.g.Klopfstein et al. 2010, Matzen da Silva et al. 2011, Dai et al. 2012).Therefore, DAMBE 7.0.28(Xia 2018) was used to check if the COI dataset of Notiophilus was subject to saturation following the Xia approach (Xia 2009).Saturation plots were made using the number of transitions and transversions plotted against patristic distances (p-distances).
Phylogenetic relationships were analysed under the maximum likelihood criterion using IQ-TREE 1.6.8(Nguyen et al. 2015).The best model nucleotide substitution was determined based on the Bayesian Information Criterion (BIC) with Modelfinder (Kalyaanamoorthy et al. 2017).In order to assess nodal support, 10,000 ultrafast bootstrap replicates (Hoang et al. 2018) and 10,000 replicates of a SH-aLRT test (Guindon et al. 2010) were performed.Ultrafast bootstrapping (UFBoot) has been demonstrated to be largely unbiased compared to standard or alternative bootstrapping, whereas SH-aLRT values have been shown to be as conservative as standard non-parametric bootstrap values (Minh et al. 2013).Typically, nodes with support values of UFBoot ≥ 95 and SH-aLRT ≥ 90 were considered as very robust and values ≥ 80% as robust (Minh et al. 2013, Hoang et al. 2018).Following Barševskis (2007), we added biogeographic information for each analysed species.

DNA Barcode analysis: Species identification
Overall, 67 DNA barcode sequences were analysed for eight of the nine species of the genus Notiophilus from Germany.Fragment lengths of the analysed DNA barcode fragments ranged from 549 to 658 bp.As is typically known for arthropods, a high AT-content was found for the DNA barcode region: the mean sequence compositions were A = 28%, C = 16.3%,G = 17.3% and T = 38.4%.Intraspecific K2P distances within a genus ranged from zero to a maximum of 3.62% (N.rufipes), whereas interspecific distances within the analysed genus had values between 0.62 and 10.22% (Table 1).The lowest interspecific distances of distinct barcode clusters were found for N. biguttatus and N. quadripunctatus with values ranging from 0.49% to 0.82% (Table 1).As a result, both species became subject to the same BIN (AAO0964).In contrast to this, maximum intraspecific pairwise distances > 2.2% were found for N. rufipes (3.62%), resulting in two BINs (AAX5571, AAC7024) for this species (Table 1).Unique BINs were identified for the remaining five species (63%).
The NJ analyses, based on K2P distances, revealed non-overlapping clusters with bootstrap support values of 100% for six species (75%).Nodal support values below 85% were found for N. biguttatus and N. quadripunctatus (Fig. 2).A detailed topology is presented in the supporting information (Suppl.material 1).Our statistical maximum parsimony analysis indicated closely related haplotypes for the studied specimen of N. biguttatus (n = 16) and N. quadripunctatus (n = 3) (Fig. 3).We identified three different haplotypes with one dominant haplotype (h1) for N. biguttatus (Fig. 3), whereas only one haplotype (h1*) was found for all analysed beetles of N. quadripunctatus (n = 3).However, this haplotype is separated from haplotype h1 and h2 of N. biguttatus only by five additional mutational steps (Fig. 3).Two distinct monophyletic lineages, in combination with high distances, were found for N. rufipes (Figs 2, 4, Table 1).

DNA Barcode analysis: Phylogenetic applicability
The test of substitution saturation revealed that the observed index of substitution saturation (Iss: 0.22) for the alignment was significantly lower than the corresponding critical index substitution saturation (Iss.c(symmetrical tree): 0.74; Iss.c (asymmetrical tree): 0.54), indicating that there was no or little saturation in the dataset (Suppl.material 2).
Modelfinder revealed the GTR+F+R3 model as the optimal nucleotide substitution model for our dataset with the following rate parameters: nucleotide frequencies A: 0.29, C: 0.16, G: 0.17, T: 0.38; substitution rates RAC: 0.01, RAG: 40.39,RAT: 21.52, RCG: 1.45, RCT: 98.02, RGT: 1; model of rate heterogeneity: FreeRate with 3 categories: category 1 with a relative rate = 0.06 and a proportion of 0.69, category 2 with a relative rate = 2.02 and a proportion of 0.27 and category 3 with a relative rate = 12.74 and a proportion of 0.03).
Table 1.Molecular distances based on the Kimura 2-parameter model of the analysed specimens and species of the genus Notiophilus.Divergence values were calculated for all studied sequences, using the Nearest Neighbour Summary implemented in the Barcode Gap Analysis tool provided by the Barcode of Life Data System (BOLD).Align sequencing option: BOLD aligner (amino acid based HMM), ambiguous base/gap handling: pairwise deletion.ISD = intraspecific distance.BINs are based on the barcode analysis from 18-11-2018.Species with maximum intraspecific distances > 2.2% and species pairs with interspecific distances < 2.2% are marked in bold.The results of the phylogenetic analysis are visualised in Figure 5. High nodal support > 90% was found for five nodes only, whereas medium support (SH-aLRT: 80-90%; UFBoot: 80-90%) was revealed for two nodes.All other nodes had support values < 80%, indicating low support.High nodal support values revealed that N. aeneus represents the sister taxon to all other analysed Notiophilus species.All other taxa are part of two clades: one clade included N. biguttatus and N. quadripunctatus (100%/100%); all other species were found in a second clade with medium support (87.4%/85%).Furthermore, high nodal support was found for a clade with N. directus, N. semistriatus, N. simulator and N. sylvaticus (97.5%/95%) and a clade with N. germinyi, N. rufipes and N. palustris (99.1%/97%).

Discussion
For many decades, ground beetles have been used regularly as indicators of biodiversity and habitat quality (e.g.Goulet 2003, Koivula 2011, Kotze et al. 2011, Li et al. 2017).Consequently, their correct identification rep-resents a pivotal component for ecological studies and conservation planning.Our species delineation analysis demonstrated that most (n = 7, 87.5%) of the analysed species of Notiophilus from Germany and Central Europe can be successfully identified by using DNA barcode sequence data and the BIN approach.This result correlates with previous barcoding studies of ground beetles (Raupach et al. 2010, Raupach et al. 2011, Pentinsaari et al. 2014, Hendrich et al. 2015, Raupach et al. 2018).Nevertheless, our analysis revealed low interspecific distances, as well as high intraspecific variability that are worthy of discussion.
Low interspecific distances were found for N. biguttatus and N. quadripunctatus (0.62%) (Fig. 3).Based on their very similar morphology, a close relationship has been previously hypothesised (e.g.Hemmann and Trautner 2002).Both species can appear sympatric.However, only comprehensive analysis of i) more specimens sampled from various localities, ii) other faster evolving, in particular nuclear markers as microsatellites or RAD-Seqs, and iii) comprehensive morphological and morphometric studies will help to clarify if two closely related but distinct species exist or hybridisation still takes place.
In contrast to this, maximum intraspecific pairwise distances with values between 1.5 and 3.6% were found between two distinct monophyletic lineages of N. rufipes (Fig. 4).The collection sites of both lineages A (n = 6) and B (n = 2) revealed no specific geographical pattern (Fig. 4).We also found no differences in their male genitalic characters.Based on the low number of studied specimens and the mitochondrial marker used, we are currently unable to identify factors that generate the observed variability.Examples of such factors may include: i) phylogeographic events as reported for other carabids (e.g.Zhang et al. 2006, Faille et al. 2015, Weng et al. 2016), ii) the presence of the maternally inherited endosymbionts such as Wolbachia (e.g.Roehrdanz and Levitan 2007, Duron et al. 2008, Werren et al. 2008, Gerth et al. 2011), or iii) the existence of cryptic species (e.g.Faille et al. 2013, Liebherr 2015, Sproul and Maddison 2017).Additional specimens from different locations have to be carefully analysed using morphological and molecular data to answer these results.
Despite the fact that only few nodes had high support values, the phylogenetic analysis revealed some important results: i) N. aeneus represents the sister taxon to all other analysed N. species, ii) all other taxa are part of two clades: one clade includes N. biguttatus and N. quadripunctatus with maximum support (100%/100%); all other species are found in a second clade with medium support (87.4%/85%), iii) high nodal support is shown for a clade with the closely related species of N. directus, N. semistriatus, N. simulator and N. sylvaticus and iv) high nodal support is revealed for clade with N. germinyi, N. rufipes and N. palustris (Fig. 5).The close relationship of N. directus, N. semistriatus, N. simulator and N. sylvaticus and the low distance values between these species (1.8 to 6.4%) give evidence for a possible   radiation of these four species in North America (Fig. 5).If Asia represents the real hypothetical centre of origin of Notiophilus (Barševskis 2007), North America has been colonised at least two times.Interestingly, both species that were documented for Africa, are closely related.The low support values of most nodes, however, do not allow additional suggestions concerning the colonisation patterns of other regions by this genus.

Conclusions
The assessment of biodiversity using molecular tools represents an essential aspect of modern biological sciences.In this context, our dataset represents another step in building a comprehensive DNA barcoding library for carabids in Germany and Central Europe.Furthermore, a first phylogenetic analysis of this genus is presented.Although the present dataset included sequences of only 15 of the 57 known species of Notiophilus and, in particular, endemic species from Central Asia are missing, our analysis reveals some important insights into the phylogeny of this genus, including a well-supported clade of N. directus, N. semistriatus, N. simulator and N. sylvaticus that gives some evidence for a possible radiation of these species in North America, as well as a close relationship of N. germinyi, N. palustris and N. rufipes.

Figure 3 .
Figure 3. Maximum statistical parsimony network of Notiophilus biguttatus (Fabricius, 1779) and Notiophilus quadripunctatus Dejean, 1828.Parameters used included default settings for connection steps, gaps being treated as fifth state.Each line represents a single mutational change, whereas small black dots indicate missing haplotypes.The numbers of analysed specimens (n) are listed and the diameter of the circles is proportional to the number of specimens for each haplotypes (see given open half circles with numbers).Scale bars = 1 mm.Source of photos: http://www.eurocarabidae.de/(access date: 2019-01-15).

Figure 4 .
Figure 4. Subtree of the neighbour joining topology, based on Kimura 2-parameter distances of all ana lysed specimens of Notiophilus rufipes Curtis, 1829.Branches with specimen ID-number from BOLD, species names and sample localities.Numbers next to internal nodes are non-parametric bootstrap values (in %).Source of photo: http://www.eurocarabidae.de/(access date: 2019-01-15).

Figure 5 .
Figure 5. Maximum likelihood phylogeny inferred in IQ-TREE, based on the CO1 barcode fragment for the genus Notiophilus.The model of nucleotide substitution used was selected with Modelfinder as part of the IQ-TREE work package.The tree was rooted with five Nebria species as outgroup.Nodal support was calculated with SH-aLRT (above) and UFBoot (below) values.Black dots indicate very robust nodes with very high values (SH-aLRT ≥ 90%, UFBoot ≥ 95%), grey dots indicate moderately robust nodes (SH-aLRT ≥ 80%, UFBoot ≥ 80%) and white dots indicate weak nodes (SH-aLRT < 80%, UFBoot < 80%) (see Material and Methods for details).Continent silhouettes indicate the biogeographic distribution of the analysed taxa (from left to right: Africa, Europe, Asia and North America).