知识中心 - 北京概普生物科技有限公司(GapTech)

2022年3月好文速览

生信干货 Montreal ·2022年4月11日 14:33

今天我们为大家推荐一个新的arxiv家族的成员——jobrvix。这是一个专门为学术界找工作设计的arxiv: https://jobrxiv.org/

按照网站介绍，jobrxiv是在疫情之下应运而生的网站，初衷在于让招聘市场更公平：每个要找工作的人可以接触到每则招聘广告，而所以实验室可以找到合适的候选人（To make recruitment fairer: with jobRxiv, every candidate has access to every job, and all labs can find the best candidates, whatever their budget）。

之所以冠以rxiv的名字，是因为同其他预印本（preprint）网站一样，jobrxiv也是本着开放和免费的宗旨。尽管以上原则被以预印本的形式广泛地应用于学术界出版中，在学术招聘方面这一方式的网站大概还不多见。所谓免费就是：作为雇主，你可以免费投广告，作为雇员，可以浏览任何投放的广告货比三家。当然，若是雇主想一次性投放多则广告（打包），那么你需要支付一定数目的费用，用以支持网站的维持。需要留意，尽管名字都带arxiv，jobrxiv与biorxiv，medrxiv，以及preprint鼻祖arxiv都无任何实质上的联系。

网站主页十分简明，就是一则则的招聘广告。右侧是检索栏，帮助用户迅速缩小范围。招聘的类目上看，还是非常丰富的，从博士生到博士后，从大学教授到杂志编辑，还有项目主管、系主任等等等等。

小编试着输入China，可惜只找到了四则广告，而若是输入Canada，弹出了33条招聘广告，由此可见jobrxiv在国内的影响力还很不足。在这里小编呼吁各位实验室老板，好好利用这个免费的平台。

若想关注Jobrxiv的招聘信息，你既可以通过直接登录其网站，也可以通过社交媒体获取最新信息。

既然说到了招聘，小编顺便为大家推荐一个在社交媒体上看到的最新招聘信息：来自法国巴黎萨克雷大学（Université Paris-Saclay）的40 个 fully funded phd positions，涵盖植物科学、食品科学、基因组学等多个领域。全球的申请季快结束了，疫情之下，想出国深造的朋友要抓紧时间了（链接见文末）。

说完了jobrxiv，下面为大家带来biorxiv的上月好文速览。

【修饰】CRISPR基因编辑先驱、诺奖得主Doudna新文：组蛋白修饰对边际效率的提升（看行文格式是要奔pnas去了？）

Decorating chromatin for enhanced genome editing using CRISPR-Cas9

CRISPR-associated (Cas) enzymes have revolutionized biology by enabling RNA-guided genome editing. Homology-directed repair (HDR) in the presence of donor templates is currently the most versatile way to introduce precise edits following CRISPR-Cas-induced double-stranded DNA cuts, but HDR efficiency is generally low relative to end-joining pathways that lead to insertions and deletions (indels). We tested the hypothesis that HDR could be increased using a Cas9 construct fused to PRDM9, a chromatin remodeling factor that deposits histone methylations H3K4me3 and H3K36me3 shown to mediate homologous recombination in human cells. Our results show that the fusion protein contacts chromatin specifically at the Cas9 cut site in DNA to double the observed HDR efficiency and increase the HDR:indel ratio by 3-fold compared to that induced by Cas9 alone. HDR enhancement occurred in multiple cell lines with no increase in off-target genome editing. These findings underscore the importance of chromatin structure for the choice of DNA repair pathway during CRISPR-Cas genome editing and provide a new strategy to increase the efficiency of HDR. Significance Statement CRISPR-Cas-mediated homology-directed repair (HDR) enables precision genome editing for diverse research and clinical applications, but HDR efficiency is often low due to competing end-joining pathways. Here, we describe a simple strategy to influence DNA repair pathway choice and improve HDR efficiency by engineering CRISPR-Cas9-methyltransferase fusion proteins. This strategy highlights the impact of histone modifications on DNA repair following CRISPR-Cas-induced double-stranded breaks and adds to the CRISPR genome editing toolbox.

【结构】芝加哥大学：深度学习助力蛋白质侧链包装预测

An end-to-end deep learning method for rotamer-free protein side-chain packing

Protein side-chain packing (PSCP), the task of determining amino acid side-chain conformations, has important applications to protein structure prediction, refinement, and design. Many methods have been proposed to resolve this problem, but their accuracy is still unsatisfactory. To address this, we present AttnPacker, an end-to-end, SE(3)-equivariant deep graph transformer architecture for the direct prediction of side-chain coordinates. Unlike existing methods, AttnPacker directly incorporates backbone geometry to simultaneously compute all amino acid side-chain atom coordinates without delegating to a rotamer library, or performing expensive conformational search or sampling steps. Tested on the CASP13 and CASP14 native and non-native protein backbones, AttnPacker predicts side-chain conformations with RMSD significantly lower than the best side-chain packing methods (SCWRL4, FASPR, Rosetta Packer, and DLPacker), and achieves even greater improvements on surface residues. In addition to RMSD, our method also achieves top performance in side-chain dihedral prediction across both data sets.

【提速】密歇根大学Narayanasamy实验室：第三代测序比对流行工具minimap2再提速！

Accelerating Minimap2 for accurate long read alignment on GPUs

We extract better intra-read parallelism from chaining without loosing mapping accuracy by forward transforming Minimap2’s chaining algorithm. Further, we utilize the high memory available on modern cloud instances for better performance on the GPU by converting a sparse vector which defines the chaining workload to a dense one in order to optimize for better arithmetic intensity (more operations per byte of data fetched from high-latency global memory) on the GPU. We also optimize for better workload balancing, data locality and minimal branch divergence on the GPU. We show mm2-ax on an NVIDIA A100 GPU improves the chaining step with 12.6 - 5X speedup and 9.44 - 3.77X speedup : costup over the fastest version of Minimap2, mm2-fast, benchmarked on a single Google Cloud Platform instance of 30 SIMD cores.

【可视】广州数智生物：组学分析的可视化在线工具GraphBio

GraphBio: a shiny web app to easily perform popular visualization analysis for omics data

Here, we present GraphBio, a shiny web app to easily perform visualization analysis for omics data. GraphBio provides 15 popular visualization analysis methods, including heatmap, volcano plots, MA plots, network plots, dot plots, chord plots, pie plots, four quadrant diagrams, venn diagrams, cumulative distribution curves, PCA, survival analysis, ROC analysis, correlation analysis and text cluster analysis. It enables experimental biologists without programming skills to easily perform popular visualization analysis and get publication-ready figures.

【综述】深度学习在空间转录组学里的应用

Deep Learning in Spatial Transcriptomics: Learning From the Next Next-Generation Sequencing

Spatial transcriptomics (ST) technologies are rapidly becoming the extension of single-cell RNA sequencing (scRNAseq), holding the potential of profiling gene expression at a single-cell resolution while maintaining cellular compositions within a tissue. Having both expression profiles and tissue organization enables researchers to better understand cellular interactions and heterogeneity, providing insight into complex biological processes that would not be possible with traditional sequencing technologies. The data generated by ST technologies are inherently noisy, high-dimensional, sparse, and multi-modal (including histological images, count matrices, etc.), thus requiring specialized computational tools for accurate and robust analysis. However, many ST studies currently utilize traditional scRNAseq tools, which are inadequate for analyzing complex ST datasets. On the other hand, many of the existing ST-specific methods are built upon traditional statistical or machine learning frameworks, which have shown to be sub-optimal in many applications due to the scale, multi-modality, and limitations of spatially-resolved data (such as spatial resolution, sensitivity and gene coverage). Given these intricacies, researchers have developed deep learning (DL)-based models to alleviate ST-specific challenges. These methods include new state-of-the-art models in alignment, spatial reconstruction, and spatial clustering among others. However, deep-learning models for ST analysis are nascent and remain largely underexplored. In this review, we provide an overview of existing state-of-the-art tools for analyzing spatially-resolved transcriptomics, while delving deeper into the DL-based approaches. We discuss the new frontiers and the open questions in this field and highlight the domains in which we anticipate transformational DL applications.

【移动】巴黎萨克雷大学（Université Paris-Saclay）：植物病原真菌灰霉菌基因组中的移动元件

Botrytis cinerea strains infecting grapevine and tomato display contrasted repertoires of accessory chromosomes, transposons and small RNAs

The fungus Botrytis cinerea is a polyphagous pathogen that encompasses multiple host-specialized lineages. While several secreted proteins, secondary metabolites and retrotransposons-derived small RNAs have been characterized as virulence factors, their role in host specialization remain unknown. The aim of this study was to identify the genomic correlates of host-specialization in populations of B. cinerea associated with grapevine and tomato. Using PacBio sequencing, we produced complete assemblies of the genomes of strains Sl3 and Vv3 that represent the French populations T and G1 of B. cinerea, specialized on tomato and grapevine, respectively. Both assemblies revealed 16 core chromosomes that were highly syntenic with chromosomes of the reference strain B05.10. The main sources of variation in gene content were the subtelomeric regions and the accessory chromosomes, especially the chromosome BCIN19 of Vv3 that was absent in Sl3 and B05.10. The repertoires and density of transposable elements were clearly different between the genomes of Sl3 and Vv3 with a larger number of subfamilies (26) and a greater genome coverage in Vv3 (7.7%) than in Sl3 (14 subfamilies, 4.5% coverage). An Helitron-like element was found in almost all subtelomeric regions of the Vv3 genome, in particular in the flanking regions of a highly duplicated gene encoding a Telomere-Linked Helicase, while both features were absent from the Sl3 and B05.10 genomes. Different retrotransposons in the Sl3 and the Vv3 strains resulted in the synthesis of distinct sets of small RNAs. Finally, extending the study to additional strains indicated that the accessory chromosome BCIN19 and the small RNAs producing retrotransposons Copia_4 and Gypsy_7 are common features of the G1 population that are scarcely if ever found in strains isolated from other populations. This research reveals that accessory chromosomes, repertoires of transposons and their derived small RNAs differ between populations of B. cinerea specialized on different hosts. The genomic data characterized in our study pave the way for further studies aiming at investigating the molecular mechanisms underpinning host specialization in a polyphagous pathogen.

【无足】蚓螈（一种爬行动物）基因组揭示无足动物趋同演化的机制

Caecilian genomes reveal molecular basis of adaptation and convergent evolution of limblessness in snakes and caecilians

We present genome sequences for Geotrypetes seraphini (3.8Gb) and Microcaecilia unicolor (4.7Gb) caecilians, a limbless, mostly soil-dwelling amphibian clade with reduced eyes, and unique putatively chemosensory tentacles. We identify signatures of positive selection unique to caecilians in 1,150 orthogroups, with enrichment of functions for olfaction and detection of chemical signals. All our caecilian genomes are missing the ZRS enhancer of Sonic Hedgehog, shown by in vivo deletions to be required for limb development in mice and also absent in snakes, thus revealing a shared molecular target implicated in the independent evolution of limblessness in snakes and caecilians.

【辐射】美国杜克大学：多组学研究为海胆发育的分子机理提供线索

Recent reconfiguration of an ancient developmental gene regulatory network in Heliocidaris sea urchins

Changes in developmental gene regulatory networks (dGRNs) underlie much of the diversity of life1, but the evolutionary mechanisms that operate on interactions with these networks remain poorly understood. Closely related species with extreme phenotypic divergence provide a valuable window into the genetic and molecular basis for changes in dGRNs and their relationship to adaptive changes in organismal traits. Here we analyze genomes, epigenomes, and transcriptomes during early development in two sea urchin species in the genus Heliocidaris that exhibit highly divergent life histories and in an outgroup species. Signatures of positive selection and changes in chromatin status within putative gene regulatory elements are both enriched on the branch leading to the derived life history, and particularly so near core dGRN genes; in contrast, positive selection within protein-coding regions have at most a modest enrichment in branch and function. Single-cell transcriptomes reveal a dramatic delay in cell fate specification in the derived state, which also has far fewer open chromatin regions, especially near dGRN genes with conserved roles in cell fate specification. Experimentally perturbing the function of three key transcription factors reveals profound evolutionary changes in the earliest events that pattern the embryo, disrupting regulatory interactions previously conserved for ∼225 million years. Together, these results demonstrate that natural selection can rapidly reshape developmental gene expression on a broad scale when selective regimes abruptly change and that even highly conserved dGRNs and patterning mechanisms in the early embryo remain evolvable under appropriate ecological circumstances.

【tRNA】一个全新植物tRNA数据库

PtRNAdb: A web resource of Plant tRNA genes from a wide range of plant species

tRNA, as well as their derived products such as short interspersed nuclear elements (SINEs), pseudogenes and transfer-RNA, derived fragments (tRFs) has now been shown to be vital for cellular life, functioning and adaptation during different stress conditions in all diverse life forms. In this study, we have developed PtRNAdb (www.nipgr.ac.in/PtRNAdb), a plant exclusive tRNA database containing 113849 tRNA gene sequences from phylogenetically diverse plant species. We have analysed a total of 106 nuclear, 89 plastidial and 38 mitochondrial genomes of plants by tRNAscan-SE software package, and after careful curation of the output data, we developed this database and integrated the data. The information about the tRNA gene sequences obtained, were further enriched with consensus sequence based study of tRNA genes based on their isoacceptors and isodecoders. We have also built covariance models based on the isoacceptors and isodecoders of all the tRNA sequences using infernal tool. The user can also perform BLAST not only against PtRNAdb entries but also against all the tRNA sequences stored in PlantRNA databases; and annotated tRNA genes across the plant kingdom available at NCBI. For the users’ ease, we have also incorporated the tRNAscan-SE tool for tRNA gene prediction, and ViennaRNA package for structural analysis on the home page of PtRNAdb. This resource is believed to be of high utility for plant researchers as well as molecular biologists to carry out further exploration of plant tRNAome on a wider spectrum, as well as for performing comparative and evolutionary studies related to tRNAs and their derivatives across all domains of life.

【检验】加拿大不列颠哥伦比亚大学（University of British Columbia）：一种新的统计学方法PicMin用以发现对环境适应的基因

Using genome scans to identify genes used repeatedly for adaptation

Adaptation occurring in similar genes or genomic regions in distinct lineages provides evolutionary biologists with a glimpse at the fundamental opportunities for and constraints to diversification. With the widespread availability of high throughput sequencing technologies and the development of population genetic methods to identify the genetic basis of adaptation, studies have begun to compare the evidence for adaptation at the molecular level among distinct lineages. However, methods to study repeated adaptation are often oriented towards genome-wide testing to identify a set of genes with signatures of repeated use, rather than evaluating the significance at the level of an individual gene. In this study, we propose PicMin, a novel statistical method derived from the theory of order statistics that can test for repeated molecular evolution to estimate significance at the level of an individual gene, using the results of genome scans. This method is generalizable to any number of lineages and indeed, statistical power to detect repeated adaptation increases with the number of lineages that have signals of repeated adaptation of a given gene in multiple lineages. An implementation of the method written for R can be downloaded from https://github.com/TBooker/PicMin.

开头提及的巴黎萨克雷大学的40个全奖phd招聘广告：

https://agristok.net/2022/03/23/40-fourty-fully-funded-phd-positions-in-plant-sciences-agriculture-food-environment-and-biology-at-university-of-paris-saclay-in-france/