Research projects
Academic year 2021/22
ISBN: 978-5-7422-7814-6
A transcriptome assembly from fragments of the annelids Pygospio elegans (Spionidae, Annelida) and Arenicola marina (Arenicolidae, Annelida) | Zoological Institute of the RAS

student: Aleksandr Chen
supervisor: Elena Novikova

Slides
GitHub
Annelids, like many other invertebrate animals, replace lost body parts in a process called regeneration. In particular, polychaetes, a class of generally marine annelid worms, are capable of regenerating to some degree. The degree of regeneration varies widely across the taxon. For instance, Pygospio elegans (Spionidae, Annelida) is capable of regenerating both head and tail segments, whereas Arenicola marina (Arenicolidae, Annelida) does not regenerate lost segments. The aim of the project is to assemble transcriptomes of two polychaetes - Pygospio elegans and Arenicola marina, - and to prepare data for further investigation of genes responsible for gradient expression in the body of annelids.
Transcriptome analysis of Platynereis dumerilii (Nereididae, Annelida) and Pygospio elegans (Spionidae, Annelida) at different stages of anterior and posterior regeneration | Zoological Institute of the RAS

student: Anna Koroleva
supervisor: Elena Novikova

Slides
GitHub
In the course of the project, it will be necessary to identify genes that are activated and suppressed at each stage of regeneration in the tail and head sites of regeneration compared to point 0. It will also be necessary to determine the belonging of these genes to biological processes and build heat maps that reflect the dynamics of expression of selected conservative genes in the process of regeneration.
"Split" repeat resolution for long reads | Center for bioinformatics and algorithmic biotechnology, SPbSU

student: Grigoriy Bukley
supervisor: Dmitry Antipov

Slides
GitHub
When assembling the genome, de Bruijn graphs built on the basis of reads are used. However, some of the information from the original reads remains unused in the graph. Due to inaccuracies of readings and imperfections of genome assemblers, unresolved repeats occur in the de Bruijn graph. This project implements some approaches for resolving repetitions for the LJA assembler based on Split methods.
Role of Drosophila Chromatin Remodeling Factor CHD1 in regulation of gene expression | NRC "KI" - PNPI

students: Zhanna Repinskaia, Alexander Zhuravlev
supervisor: Alexander Konev

Slides
GitHub
The aim of this project is to study the role of chromatin remodeling factor CHD1 (Chromo-ATPase/Helicase-DNA-binding protein 1) in the in the process of dosage compensation in Drosophila melanogaster. We have previously demonstrated CHD1 is unique among Drosophila chromatin-remodeling factors in terms of its specific recruitment to the male X chromosome suggesting the role in DC. To investigate the specific roles (if any) of the CHD1 in dosage compensation and assess additional functions in regulation of gene expression we sequenced (using an MGI platform) rRNA depleted total RNA from wildtype and Chd1 mutant male and female larvae.The main task of this project is to analyze the data acquired. We will need to perform quality control, alignment and analysis of gene differential expression in the X-chromosome and in autosomes of Drosophila males and females.
Studying Salmonella gene expression dynamics in response to novobiocin | ITMO University

students: Semyon Kupriyanov, Valeriia Ladyhina
supervisor: Aleksandr Tkachenko

Slides
GitHub
The object of the study is Salmonella enterica, bacterium largely resistant to novobiocin and DNA superspiralization alterations in general. Here we aim to analyze gene expression changes in Salmonella cultures grown on media with varying concentrations of antibiotics and to identify clusters of coexpressing genes. We need to analyze gene expression in several bacterial cultures (several timepoints and concentrations). Identify clusters of coexpressing genes. Characterize clusters form functional point of view.
Developing best practices for semi-automatic single-cell data annotation | ImmunoMind Inc.

students: Ivan Semenov, Anton Muromtsev, Vladimir Shitov
supervisors: Daniil Litvinov, Vasily Tsvetkov

Slides
GitHub
Single-cell sequencing is paving the way for precision medicine. It is the next step towards making precision medicine more accurate. One of the most important step in single-cell data analysis is cell type labeling. This is a very time-consuming process, the automation of which is a task of current interest. The goal of this project is using modern machine learning approaches to build semi-automatic single-cell data annotation tool.
Structure-based modeling of cysteine and serine disease variants of human proteome | Skoltech

student: Dmitrii Podgalo
supervisor: Petr Popov

Slides
GitHub

The goal of this project is to model structures of human proteins with disease-associated amino acid substitutions. Two types of amino acid substitutions are selected: X to Cysteine or X to Serine (X is any amino acid residue) – these residues are often used as the attachment points for covalent drugs.
Determining the effectiveness of momi2 for inferring demographic history in GADMA | ITMO

students: Kseniya Struikhina, Valentin Mikhalchuk
supervisor: Ekaterina Noskova

Slides
GitHub
Restoration of parameters of demographic history by chromosomes and comparison with real data.
Application of machine learning methods to approximate demographic history parameters from allele frequency spectrum | ITMO

student: Elizaveta Gorelkina
supervisor: Ekaterina Noskova

Slides
GitHub
Full-fledged genetic data are not used to output demographic history parameters, as they require a lot of computing resources. Therefore, they use various statistics based on these data. One of these statistics is the allele-frequency spectrum. In the simplest case, it can be represented as a multidimensional tensor (matrix). Existing methods for deriving demographic history parameters (dadi, moments) use local optimization algorithms that work faster for given initial approximations of parameters close to optimal. In this project, it is proposed to apply the simplest machine learning methods for approximate prediction of the parameters of the demographic history of two populations. As machine learning algorithms, a choice is offered: random forest or convolutional neural networks. It is required to generate data, train and validate the selected method on them.
Genome-wide association search (GWAS) and construction of polygene scales (PRS) for height and weight | Genotek

students: Mark Zorin, Dmitrii Iliushchenko
supervisor: Alexander Rakitko

Slides
GitHub
Genome-wide association study (GWAS) is a search for single nucleotide polymorphisms (SNPs) on which the human phenotype directly depends. GWAS is often used to identify various diseases or risks of their occurrence. In our study we analyzed the genetic variants of a large cohort of Russians to identify SNPs with significant associations with changes in body mass index.
In silico modeling of coverage profiles for multiplex target panels | ParSeq Lab

student: Anastasia Kislova
supervisor: Ivan Pyankov

Slides
GitHub
The development of multiplex target panels for polymerase chain reaction means that highly specific primers are designed to minimize the number of amplicons for target regions. The panels are obligatory in vitro validated, but in silico validation would improve the existing pipeline. The goal of this project was to adjust existing tool called DegenPrimer for in silico validation of designed target panels and check the output correlation with the real data.
Pipeline for a targeted gene sequencing panel validation | ParSeq Lab

student: Maria Lopatkina
supervisor: Tamara Simakova

Slides
GitHub
Targeted sequencing is a rapid and cost-effective way to detect known and novel variants and is widely applied in medicine. Targeted sequencing requires upfront selection and isolation of genes or regions of interest, typically by either PCR amplification or hybridization-based capture methods. It should be recalled that not all regions of interest are suitable for analysis. It is important to be aware of the analytical characteristics and limitations of panels. The goal of the project was to develop a pipeline for validation of targeted gene sequencing panels.
Clustering Hi-C contact graphs using Graph Neural Networks | Center for Bioinformatics and Algorithmic Biotechnology, SPbSU

student: Fyodor Velikonivtsev
supervisors: Ivan Tolstoganov, Anton Korobeynikov

Slides
GitHub
A typical approach to binning using Hi-C data consists of two steps: constructing Hi-C contact graph, where nodes are contigs and edge weights are normalized number of Hi-C links between them, and Hi-C contact graph clusterization. Existing binning approaches based on Hi-C technology, such as bin3C and HiCBin, are based on community detection algorithms, such as Infomap and Leiden. Recent advances in genome binning approaches implemented in tools such as VAMB clearly shows the potential of neural networks in binning problem solutions. As a result we hope that recent advances in graph clusterization using Graph Neural Networks (GNN) might provide a more generalized and possibly more accurate way to clusterize Hi-C contact maps, which would in turn provide more accurate and complete MAGs and improve further metagenomic analysis.
Analysis and construction of SARS-CoV-2 neutralizing ligands with extensive Spike binding | Laboratory of Biomolecular NMR, Saint Petersburg State University

students: Aleksandr Kovalenko, Xenia Sukhanova
supervisors: Olga Lebedenko, Nikolai Skrynnikov

Slides
GitHub
Protein minibinders (MP1 and MP3) have been designed in silico against RBD of wild-type spike protein to prevent SARS-CoV-2 entry into cells. However, the emergence of new covid strains hinders MPs effective use. The main goal of this project is to develop a workflow for estimation of the MP1 and MP3 proteins binding affinity to RBD of the new SARS-CoV-2 variants (alpha, delta, delta+, omicron) and propose a way to optimize the MP1 and MP3 sequences for stronger interaction with new covid strains RBDs. As a result, the collection of python scripts is provided for structure manipulation, simulations, binding analyses and results processing. MP1/3 interaction with the new SARS-CoV-2 RBD variants is analysed. For MP1 no beneficial mutation were observed, while MP3(D37R) mutant with enhanced binding to delta+ strain RBD is proposed.

Differential expression analysis of macrophage RNA sequencing data using the Hobotnica tool | Ivannikov Institute for System Programming of the RAS, Information Systems Department

students: Anton Zhelonkin, Alexandra Belyaeva
supervisor: Evgeny Karpulevich

Slides
GitHub
Macrophages are a central component of innate immunity and play an important role in host defense. In our work, we study the effect of lipopolysaccharide on macrophages derived from the two most polar (CD14+ and CD16+ monocytes), as well as an intermediate subset of blood monocytes from healthy donors, and evaluate what happens to the subset most prone to polarization at the transcriptomic level. Tools for calculating the differential expression of RNA sequencing data work on the basis of the apparatus of mathematical statistics. In order to choose the most appropriate tool for specific data, it is proposed to use the Hobotnitsa tool.


Search for homologs of egg-cell specific genes, study of their expression patterns and regulatory elements for the creation of effective constructs for genetic engineering | Skolkovo Institute of Science and Technology, Institute for Information Transmission Problems RAS

students: Elena Grigoreva, Anna Toidze
supervisors: Maria Logacheva, Artem Kasianov

Slides
GitHub
Using germ line cells-specific promoters is effective approach in genome editing. EC1.1 and EC1.2 are A. thaliana genes from Egg Cell family that are specifically and highly expressed in egg cells. It was shown that using of promoters of these genes significantly improved genome editing. But no similar promoters are known for other plants. Knowing that homologous genes can have similar functions, we supposed that EC homologs could have similar expression patterns and using their promoters could also be effective. So the aim of our project is to find functional analogs of EC genes in different crops and model plants and explore their expression patterns and regulatory elements.


Research of signaling pathways and transcriptional factors (TF) activity alteration associated with acute myeloid leukemia (AML) | BostonGene

students: Iuliia Ruzhenkova, Ekaterina Osintseva
supervisor: Eleonora Belykh

Slides
GitHub
In the bone marrow, transcriptional factors (TFs) control the genes important for normal hematopoiesis maintenance. Dysregulation of TFs activity can lead to hematological malignancies including acute myeloid leukemia (AML). Currently, TFs are considered as promising drug targets, and their research (as well as the signaling pathways responsible for their activation) is relevant for the development of new therapeutic strategies. In the present study, we analyze publicly available NGS data (RNA-seq and scRNA-seq) from the bone marrow of AML patients and healthy donors: after preprocessing the data, we run PROGENy and DoRothEA programs to determine the important molecular pathways and TFs activity alteration. To validate the obtained results the literature research, Kaplan-Meier survival analysis as well as TFs visualization single cells are performed.


Correlation between DNA sequence and chromatin structure | EPAM Systems

students: Kirill Kirilenko, Ivan Kozlov
supervisor: Gennadii Zakharov

Slides
GitHub
The goal is to determine whether DNA sequence itself can be a good predictor of 3D nuclear structure.


Сlassification of β-arches based on their 3D structure | Saint-Petersburg State University

students: Rustam Basyrov, Leonid Zhozhikov
supervisor:
Stanislav Bondarev

Slides
GitHub
β-arches are structural elements of proteins that include two β-strands united by a turn. They are present in the proteins of β-solenoids, as well as amyloid aggregates. In the original article in which the term was proposed, such structures were divided into groups based on the conformation of amino acids at the turn sites, as well as the number of amino acids in them. As part of the project, we analyzed the diversity of 3D β-arch organizations.


Analysis of differential expression of genes involved in NO-signaling in synucleinopathies | Saint-Petersburg State University

students: Aleksandra Livanova, Anna Kapitonova
supervisor: Stanislav Bondarev

Slides
GitHub
Synucleinopathies are neurodegenerative diseases characterized by aggregation of proteins, in particular, alpha-synuclein, in brain neurons. According to bioinformatic predictions, nitric oxide synthase 1 adaptor protein (NOS1AP) is also capable of forming protein aggregates in neurons. Moreover, it directly interacts with alpha-synuclein. Based on this, the hypothesis emerged, that NO signaling could be involved in the pathogenesis of synucleinopathies. The aim of the project was to evaluate changes in expression level of NOS1AP and other NO-signaling genes in brain samples from patients with synucleinopathies. Four open datasets with raw RNA-seq reads from different brain regions of patients with synucleinopathies were analyzed. Although NOS1AP expression did not change significantly, tissue-specific differential expression of other NO-signaling genes was demonstrated.


Analysis of variable evolutionary constraint within a single ORF | Bioinformatics Institute, Research Center of Medical Genetics

student: Oksana Kotovskaya
supervisors:
Yury Barbitoff, Mikhail Skoblov

Slides
GitHub
Genetic variants leading to loss of function are not found in all genes. If a gene is found under selection pressure, protein truncation variants (PTV) are much less common in them (Cassa C., 2017). Most often, such genes have important functions, and a such catastrophic change in the protein leads to various diseases or death (Samocha K., 2014). In this work, we are interested in the case when the division of genes into conservative (that is, under selection) and non-conservative (that is, free from selection) becomes less unambiguous, namely, cases when non-conservative genes are found in relatively conservative genes. This work is devoted to implementation of algorithm to the search for such sequences.
Goal: to estimate the evolutionary conservation of individual regions within single ORF.
The key task to achieve this goal was to implement an algorithm based on the hidden Markov model (HMM), which allows to determine the conservativeness of individual regions of the protein-coding sequence (CDS).

Analysis of the effects of combinations of single nucleotide polymorphisms within a single codon | Bioinformatics Institute, Research Center of Medical Genetics

student: Ekaterina Kravchuk
supervisors:
Yury Barbitoff, Mikhail Skoblov

Slides
GitHub
Creating a tool that correctly predicts the effects of polymorphisms within a single codon.
Evaluation of the evolutionary conservation of uORFs | Bioinformatics Institute, Research Center of Medical Genetics

student: Dmitrii Poliakov
supervisors: Yury Barbitoff, Mikhail Skoblov

Slides
GitHub
The method highlights ORFs that encode functional proteins important for human survival. The idea of the current project is to try to apply the method to upstream open reading frames (uORFs) to find the ones encoding functional products. uORFs are the relatively short ORFS found upstream of the main ORF in eucaryotic mRNAs. image They were lacking attention for a long time because of the paradigm "one mRNA - one protein" for eucaryotic mRNA. Then ribosome profiling emerged, the method which allowed for the mapping of the translating ribosomes on mRNAs. And then it became clear that there are a lot of uORFs on which translation occurs. If the majority of products of the uORFs are functional is still a matter of discussion though. Our goal is to try to find uORFs encoding functional proteins.
Generation of possible single-nucleotide variants with a given effect on protein-coding sequence | Bioinformatics Institute, Research Center of Medical Genetics

student: Oxana Kolpakova
supervisors: Yury Barbitoff, Mikhail Skoblov

Slides
GitHub
Creation of a tool to generate pathogenic and benign SNPs for OMIM genes by substitution of 1 codon nucleotide resulting in the same amino acid substitution.
Identification of genetic variants affecting branchpoints within human introns | Bioinformatics Institute, Research Center of Medical Genetics

students: Alisa Sergeeva, Irina Veretenenko
supervisors: Yury Barbitoff, Mikhail Skoblov

Slides
GitHub
The aim of the project is to evaluate the effect of intronic variations on the position and strengh of branchpoints – the sites in introns which guide the splicing process. We used two existing tools: Branchpointer and BPP to predict branchpoint probability in reference human intrones and to identify, how inctonic variations change it. We compared pathogenic variations from ClinVar and high-frequent variations from gnomAD. We observed the similarity of BPP and Branchpointer tools in separating ClinVar and gnomAD databases but also their difference in predition probabilities. We created a new ML predictor which showed a good performance and quality metrics. Next we`re planning to verify this model on the ClinVar and gnomAD datasets and estimate the role of pathogenic variaions in the splicing process.
Dissecting the role of gene expression variability in complex traits | Bioinformatics Institute

student: Mikhail Slizen
supervisors: Yury Barbitoff

Slides
GitHub
Genome-Wide Association Study (GWAS) is a technique used to look for genome sequence variations that affect the development of complex traits. In recent years, GWAS results have been published for thousands of different traits, including two of the world's largest datasets, UK Biobank and Finngen. It is known that changes in gene expression levels are one of the main mechanisms that determine the small effects of genetic variants detected during GWAS. In this project, we test the hypothesis that not only the level of gene expression, but also the degree of expression variability, is associated with the influence of a gene on complex human traits.
Borgs - new entities in archaeal genomes? | Saint-Petersburg State University

students: Vera Emelianenko, Alexandra Kolodyazhnaya
supervisors: Mikhail Rayko, Lavrentiy Danilov

Slides
GitHub
In a preprint published on bioarchive in July 2021, authours describe misterious new entities in archeal genomes (Al-Shayeb et al., 2021.). Authours define them them the following way: "We infer that these are a new type of archaeal extrachromosomal element with a distinct evolutionary origin. Gene sequence similarity, phylogeny, and local divergence of sequence composition indicate that many of their genes were assimilated from methane-oxidizing Methanoperedens archaea. We refer to these elements as "Borgs"". We used the Borgs sequences published in this paper along with the open-source metagenomic data to find out what are Borgs, how can they be defined and whether they have representatived in metagenomic data.


Systematics and classification of plasmids | Bioinformatics Institute

students: Ekaterina Vostokova, Pavel Vychik
supervisor: Mikhail Rayko

Slides
GitHub
Investigate an approach for plasmid systematics based on Rep-protein sequence, and develop the automatized pipeline for newly sequenced plasmid classification.
Molecular mechanisms behind the life cycle evolution and speciation in hydroids of the Arctic region | Bionformatic institute, Saint Petersburg State University, Moscow State University, N.K. Koltsov Institute of Developmental Biology

student: Polina Guro
supervisors: Lavrenty Danilov, Stanislav Kremnyov

Slides

The molecular mechanisms of speciation in hydroids have never been studied, neither has the relationship between the evolution of the life cycle and speciation ever been considered. The hydroid Sarsia lovenii was chosen as the object. Recently, in S. lovenii, breeding season polymorphism has been found to be associated with life cycle polymorphism. Colonies of one morph produce normally developed free-floating medusas, while colonies of the second morph produce attached gonophores - medusoids. The morphs identified represent phenological populations: in the example of S. lovenii, we can observe the initial stage of sympatric speciation. Thus, due to the object we have chosen, we can study the molecular mechanisms of speciation associated with the divergence of sympatric populations in breeding time and associated with the evolution of the life cycle.
Search for human proteins capable to co-aggregate with SARS-CoV-2 proteins | SPbSU

student: Evgeniia Sevasteeva
supervisor: Stanislav Bondarev

Slides
GitHub
To assess the probability of developing human amyloidosis as a result of coaggregation with SARS-CoV-2 proteins. To solve this problem, the AmyloComp program should be used, developed in SPbSU laboratory together with the A.V. Kayava group (University of Montpellier, France).
Studying complex structural variations in cancer using long reads | NIH / NCI

student: Olga Kalinichenko
supervisor: Mikhail Kolmogorov

Slides
GitHub
Cancer is driven by genomic changes. Small-scale mutations have been extensively catalogued across various types of cancer using short-read sequencing data. However, it is more difficult to detect large and complex structural rearrangements due to read mapping ambiguities. In this project, we will use long-read sequencing to explore the complex genomic changes in cancer genomes. The approach will build on our experience in computational genomics, graph algorithms and de novo assembly.
Integration of the ADASTRA database as a novel annotator module in the OpenCRAVAT pipeline | IITP RAS

students: Stepan Kuznetsov, Mikhail Fofanov, Andrey Suponin
supervisor: Artem Kasyanov

Slides
GitHub
Development of an annotator and widget and integration into the open-cravat system.
Age in gene regulatory networks | SciLifeLab, Stockholm University

student: Yuliya Burankova
supervisor: Erik Zhivkoplias

Slides
GitHub
The wide availability of system-level gene expression datasets gave rise to a variety of reverse-engineering methods that aim to reconstruct the hidden regulatory gene-gene and gene–protein relationships. Such relationships form a gene regulatory network (GRN) that controls the organismic response to changes in the environment. The GRNs we know are the result of a long biological evolution. With the phylogenomic analysis, it is possible to classify genes based on the oldest species that carry orthologous gene. For protein-protein interaction networks in yeast and human, it was shown that proteins of the same age tend to interact more. The goal of this project is to explore if gene interaction preference for genes with similar age holds in gene regulatory networks, in particular in those that describe direct regulatory interaction (transcription factor-target gene).
Analysis of 5'-isomiR targeting | National Research University HSE

student: Alexandra Gorbonos
supervisor: Stepan Nersisyan

Slides
GitHub
MicroRNAs are short non-coding RNA molecules that post-transcriptionally regulate gene expression. MicroRNA expression has been shown to play an important role in various pathologies, including various types of cancer. Studies have shown that there is variability in the nucleotide sequences at the 5'- and 3'-ends of mature miRNAs - miRNA isoforms. The project analyzes targets of miRNA isoforms in 31 cancers based on mRNA and miRNA sequencing data from The Cancer Genome Atlas (TCGA) project, and proposes a method for assessing the activity of miRNA isoforms.
Analysis of RecQ involvement in primed adaptation in the type I-E CRISPR-Cas system of Escherichia coli | Skoltech

student: Anna Shiriaeva
supervisor: Konstantin Severinov

Slides

Analysis of primed adaptation efficiency and prespacer generation efficiency in cells with the recQ deletion.
Benchmark creation for drug-target interaction (DTI) prediction task | JetBrains Research

student: Dmitrii Traktirov
supervisor: Ellen Kartysheva

Slides
GitHub
Drug-target interaction prediction (DTI) task plays an important role in the drug discovery process, which aims to identify new drugs for biological targets. Automation of prediction will speed up the process of creating new drugs. Now there are many machine learning models that solve this problem, however, due to the presence of a huge number of different datasets and testing protocols, it is difficult to compare different models with each other. And so one unified benchmark is needed.
Domain based burden analysis in PD genes | McGill

student: Artem Kosmin
supervisor: Konstantin Senkevich

Slides
GitHub
1) Prepare a list of domains with chromosome position-reference for each PD causing gene (Around 20 familiar plus 80 GWAS). 2) Write script for domain based burden analysis, QC script was prepared during the previous project 3) Project will be done using Terra cloud. Knowledge of R, awk, Linux will be helpful.
Improving Quality of Epitope Mapping by Deep Learning Methods | JetBrains Research

student: Simon Tsirikov
supervisor: Natalia Zenkova

Slides
GitHub
The task of epitope mapping is to determine the region on the surface of the antigen – a specific type of protein recognized by the organism as malicious – to which an antibody – a special protein produced to destroy the antigen – will attach. To reduce the cost and increase the speed of drug development, computer modeling of this process is used. Existing models cope well with the task when the input is a complex of an antigen and an antibody, but if there are no assumptions about a possible antibody, there is need to work only with an antigen. For this formulation of the problem, previous researchers have maximized the value of the recall metric, while the proposed model, built using the Transformer architecture, gives a higher value of the precision metric, which is more relevant for applied tasks.
Metagenomic analysis of diversity and properties of bacterial communities of White Sea sponges | IGB RAS, Skoltech

student: Anastasiia Rusanova
supervisors: Dmitry Sutormin, Svetlana Dubiley

Slides
GitHub
Sponges (phylum Porifera) form symbiotic relationship with the community of microorganisms. Sponges and their symbionts produce various pharmacologically active substances. These communities differ in taxonomic composition from those of the surrounding seawater. Metagenomic analysis of the microbiome allows to find out the taxonomic diversity and properties of the microbial community. We investigated potential bacterial symbionts of our sponges to found out their specific symbioyic features.
Potential cancer dependencies in the context of LKB1 loss in non-small cell lung cancer (NSCLC) | Clarivate

student: Tatiana Kikalova
supervisor: Aliaksei Holik

Slides
GitHub
One of the characteristics of NSCLC is a loss of the tumor-suppressor kinase LKB1 (liver kinase B1). LKB1 is known for its ability to induce apoptosis, regulate cell polarity and differentiation and suppress the growth, invasion, and metastases of tumor cells. Although the inhibition of tumor-suppressors, such as LKB1, gives an advantage in avoiding the apoptosis, is also affects the normal pathways and thus the tumor cells have to rely on alternative means (ways) of survival. This gives us an opportunity to identify effective targets in these alternative pathways that we can inhibit and by this affect only tumor cells without damaging normal tissues.
Prediction of pathogenicity of genetic variants in Kozak sequences | Bioinformatics Institute, Research Center of Medical Genetics

student: Marianna Baranovskaia
supervisors: Yury Barbitov, Michail Skoblov

Slides
GitHub
Kozak sequence is a consensus nucleotide environment of the start codon in the most of the eukaryotic mRNAs, involved in the translation initiation. Kozak sequence can be different in different mRNAs and is was reported that different Kozak sequences influenced the translation level differently. In 2014, collective of scientists has published the data of direct measurement of translation level for every possible Kozak sequence containing classic AUG start codon and computed the model of influence of the particular nucleotides on particular position in the Kozak sequence on the translation efficiency. It is known that human genome has a lot of variable positions and some of them are annotated as related with some diseases, some of them are referred as benign but the significance of some other genetic variants is uncertain for now. If the variant is located in the protein-coding secuence or other well studied sequence, it can be easy to predict the effect of such variant but if is is located in non-coding sequence the prediction becomes more unreliable. In this project we have tried to combine the data of Kozak sequence efficiency and genetic variants located in the Kozak sequences to predict possible pathogenicity of such variants to improve medical genetic analysis.
Studying the alternative ORFs in genes associated with neurological and psychiatric diseases | McGill

student: Eduard Akhmetgaliev, Ekaterina Kershinskaya
supervisor: Konstantin Senkevich

Slides
GitHub
OpenVar is the first tool for genomic variant annotation and functional effect prediction supporting deep open reading frame (ORF) annotation and polycistronic annotation of Human, Mouse, Rat and Fruit fly transcripts. OpenVar builds on the well-known and extensively used SNPEff tool (Cingolani et al., 2012), but also offers the possibility to predict variant effect in alternative ORFs as defined in OpenProt (Brunet et al., 2019). The aim of the project is to analyze data from OpenVar results for 16 VCF with GWAS SNPs associated with 7 neurological and 9 psychiatric disorders to make a list of SNPs that are nonpathogenic in the canonical ORF along with altering the alternative ORF with moderate/high impact and affecting expression in brain tissues as candidates for future functionality studies.