Causal Structure Learning (CSL)#
Spread of COVID-19#
In the era of causal revolution, identifying the causal effect of an exposure on the outcome of interest is an important problem in many areas. Under a general causal graph, the exposure may have a direct effect on the outcome and also an indirect effect regulated by a set of mediators (or intermediate variables). For instance, during the outbreak of Coronavirus disease 2019 (COVID-19), the Chinese government has taken extreme measures to stop the virus spreading such as locking Wuhan down on Jan 23rd, 2020, followed by 12 other cities in Hubei, known as the “2020 Hubei lockdowns”. This approach (viewed as the exposure), directly blocked infected people leaving from Hubei; and also stimulated various quarantine measures taken by cities outside of Hubei (as the mediators), which further decreased the migration countrywide in China, and thus indirectly control the spread of COVID-19. Quantifying the causal effects of 2020 Hubei lockdowns on reducing the COVID-19 spread regulated by different cities outside Hubei is challenging but of great interest for the current COVID-19 crisis. An analysis of causal effects that interprets the causal mechanism contributed via individual mediators is thus very important.
Gene Expression Traits in Yeast#
Over recent decades, causal discovery attracts more and more attention by disentangling the complex causal relationship in various fields. Compared to 4 to 5 million single nucleotide polymorphisms (SNPs) in a person’s genome, much fewer non-spurious genes/proteins that systematically regulate the expression of the phenotype of interest are identified. We focus on a real application of gene expression traits in yeast (Brem & Kruglyak, 2005) to discover important causal features on explaining the gene expression of interest. This dataset collected 104 yeast segregants simulated by two genetically diverse strains, BY4716 and RM11-1a, and each segregant contains thousands of genotypes that contribute to rich phenotypic diversity. A primary goal in genetics is to study how different genotypes influence the target heritable traits of interest. Due to high-dimensional genes, formally named as quantitative trait loci (QTLs), involved in this study, identifying a more parsimonious causal graph is desired to reveal the true necessary dependencies that presents the essential causality towards the outcome of interest is desired. In this study, we are interested in identifying the genes whose expression levels affect the genetic variant YER124C, which is daughter cell-specific protein, may participate in pathways regulating cell wall metabolism; deletion affects cell separation after division and sensitivity to drugs targeted against the cell wall.