Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
97 result(s) for "Xing, Haipeng"
Sort by:
Deciphering hierarchical organization of topologically associated domains through change-point testing
Background The nucleus of eukaryotic cells spatially packages chromosomes into a hierarchical and distinct segregation that plays critical roles in maintaining transcription regulation. High-throughput methods of chromosome conformation capture, such as Hi-C, have revealed topologically associating domains (TADs) that are defined by biased chromatin interactions within them. Results We introduce a novel method, HiCKey, to decipher hierarchical TAD structures in Hi-C data and compare them across samples. We first derive a generalized likelihood-ratio (GLR) test for detecting change-points in an interaction matrix that follows a negative binomial distribution or general mixture distribution. We then employ several optimal search strategies to decipher hierarchical TADs with p values calculated by the GLR test. Large-scale validations of simulation data show that HiCKey has good precision in recalling known TADs and is robust against random collisions of chromatin interactions. By applying HiCKey to Hi-C data of seven human cell lines, we identified multiple layers of TAD organization among them, but the vast majority had no more than four layers. In particular, we found that TAD boundaries are significantly enriched in active chromosomal regions compared to repressed regions. Conclusions HiCKey is optimized for processing large matrices constructed from high-resolution Hi-C experiments. The method and theoretical result of the GLR test provide a general framework for significance testing of similar experimental chromatin interaction data that may not fully follow negative binomial distributions but rather more general mixture distributions.
A Bayesian model based computational analysis of the relationship between bisulfite accessible single-stranded DNA in chromatin and somatic hypermutation of immunoglobulin genes
The B cells in our body generate protective antibodies by introducing somatic hypermutations (SHM) into the variable region of immunoglobulin genes (IgVs). The mutations are generated by activation induced deaminase (AID) that converts cytosine to uracil in single stranded DNA (ssDNA) generated during transcription. Attempts have been made to correlate SHM with ssDNA using bisulfite to chemically convert cytosines that are accessible in the intact chromatin of mutating B cells. These studies have been complicated by using different definitions of “bisulfite accessible regions” (BARs). Recently, deep-sequencing has provided much larger datasets of such regions but computational methods are needed to enable this analysis. Here we leveraged the deep-sequencing approach with unique molecular identifiers and developed a novel Hidden Markov Model based Bayesian Segmentation algorithm to characterize the ssDNA regions in the IGHV4-34 gene of the human Ramos B cell line. Combining hierarchical clustering and our new Bayesian model, we identified recurrent BARs in certain subregions of both top and bottom strands of this gene. Using this new system, the average size of BARs is about 15 bp. We also identified potential G-quadruplex DNA structures in this gene and found that the BARs co-locate with G-quadruplex structures in the opposite strand. Using various correlation analyses, there is not a direct site-to-site relationship between the bisulfite accessible ssDNA and all sites of SHM but most of the highly AID mutated sites are within 15 bp of a BAR. In summary, we developed a novel platform to study single stranded DNA in chromatin at a base pair resolution that reveals potential relationships among BARs, SHM and G-quadruplexes. This platform could be applied to genome wide studies in the future.
L1 Regularization for High-Dimensional Multivariate GARCH Models
The complexity of estimating multivariate GARCH models increases significantly with the increase in the number of asset series. To address this issue, we propose a general regularization framework for high-dimensional GARCH models with BEKK representations, and obtain a penalized quasi-maximum likelihood (PQML) estimator. Under some regularity conditions, we establish some theoretical properties, such as the sparsity and the consistency, of the PQML estimator for the BEKK representations. We then carry out simulation studies to show the performance of the proposed inference framework and the procedure for selecting tuning parameters. In addition, we apply the proposed framework to analyze volatility spillover and portfolio optimization problems, using daily prices of 18 U.S. stocks from January 2016 to January 2018, and show that the proposed framework outperforms some benchmark models.
High-Frequency Quote Volatility Measurement Using a Change-Point Intensity Model
Quote volatility is important in determining the cost of demand in a high frequency (HF) order market. This paper proposes a new model to measure quote volatility based on the point process and price-change duration. Specifically, we built a change-point intensity (CPI) model to describe the dynamics of price-change events for a given level of threshold. The instantaneous volatility of quote price can be calculated at any time according to price-change intensities. Based on this, we can quantify the cost of demanding liquidity for traders with different trading latency by using integrated variances. Furthermore, we use the autoregressive conditional intensity (ACI) model proposed by Russell (1999) as a benchmark comparison. The results suggest that our model has better performance of both in-sample fitness and out-of-sample prediction.
Statistical Surveillance of Structural Breaks in Credit Rating Dynamics
The 2007–2008 financial crisis had severe consequences on the global economy and an intriguing question related to the crisis is whether structural breaks in the credit market can be detected. To address this issue, we chose firms’ credit rating transition dynamics as a proxy of the credit market and discuss how statistical process control tools can be used to surveil structural breaks in firms’ rating transition dynamics. After reviewing some commonly used Markovian models for firms’ rating transition dynamics, we present several surveillance rules for detecting changes in generators of firms’ rating migration matrices, including the likelihood ratio rule, the generalized likelihood ratio rule, the extended Shiryaev’s detection rule, and a Bayesian detection rule for piecewise homogeneous Markovian models. The effectiveness of these rules was analyzed on the basis of Monte Carlo simulations. We also provide a real example that used the surveillance rules to analyze and detect structural breaks in the monthly credit rating migration of U.S. firms from January 1986 to February 2017.
MEAN-VARIANCE PORTFOLIO OPTIMIZATION WHEN MEANS AND COVARIANCES ARE UNKNOWN
Markowitz's celebrated mean-variance portfolio optimization theory assumes that the means and covariances of the underlying asset returns are known. In practice, they are unknown and have to be estimated from historical data. Plugging the estimates into the efficient frontier that assumes known parameters has led to portfolios that may perform poorly and have counterintuitive asset allocation weights; this has been referred to as the \"Markowitz optimization enigma.\" After reviewing different approaches in the literature to address these difficulties, we explain the root cause of the enigma and propose a new approach to resolve it. Not only is the new approach shown to provide substantial improvements over previous methods, but it also allows flexible modeling to incorporate dynamic features and fundamental analysis of the training sample of historical data, as illustrated in simulation and empirical studies.
Genome-Wide Localization of Protein-DNA Binding and Histone Modification by a Bayesian Change-Point Method with ChIP-seq Data
Next-generation sequencing (NGS) technologies have matured considerably since their introduction and a focus has been placed on developing sophisticated analytical tools to deal with the amassing volumes of data. Chromatin immunoprecipitation sequencing (ChIP-seq), a major application of NGS, is a widely adopted technique for examining protein-DNA interactions and is commonly used to investigate epigenetic signatures of diffuse histone marks. These datasets have notoriously high variance and subtle levels of enrichment across large expanses, making them exceedingly difficult to define. Windows-based, heuristic models and finite-state hidden Markov models (HMMs) have been used with some success in analyzing ChIP-seq data but with lingering limitations. To improve the ability to detect broad regions of enrichment, we developed a stochastic Bayesian Change-Point (BCP) method, which addresses some of these unresolved issues. BCP makes use of recent advances in infinite-state HMMs by obtaining explicit formulas for posterior means of read densities. These posterior means can be used to categorize the genome into enriched and unenriched segments, as is customarily done, or examined for more detailed relationships since the underlying subpeaks are preserved rather than simplified into a binary classification. BCP performs a near exhaustive search of all possible change points between different posterior means at high-resolution to minimize the subjectivity of window sizes and is computationally efficient, due to a speed-up algorithm and the explicit formulas it employs. In the absence of a well-established \"gold standard\" for diffuse histone mark enrichment, we corroborated BCP's island detection accuracy and reproducibility using various forms of empirical evidence. We show that BCP is especially suited for analysis of diffuse histone ChIP-seq data but also effective in analyzing punctate transcription factor ChIP datasets, making it widely applicable for numerous experiment types.
Firm’s Credit Risk in the Presence of Market Structural Breaks
The financial crises which occurred in the last several decades have demonstrated the significant impact of market structural breaks on firms’ credit behavior. To incorporate the impact of market structural break into the analysis of firms’ credit rating transitions and firms’ asset structure, we develop a continuous-time modulated Markov model for firms’ credit rating transitions with unobserved market structural breaks. The model takes a semi-parametric multiplicative regression form, in which the effects of firms’ observable covariates and macroeconomic variables are represented parametrically and nonparametrically, respectively, and the frailty effects of unobserved firm-specific and market-wide variables are incorporated via the integration form of the model assumption. We further develop a mixtured-estimating-equation approach to make inference on the effect of market variations, baseline intensities of all firms’ credit rating transitions, and rating transition intensities for each individual firm. We then use the developed model and inference procedure to analyze the monthly credit rating of U.S. firms from January 1986 to December 2012, and study the effect of market structural breaks on firms’ credit rating transitions.
Estimation of Parent Specific DNA Copy Number in Tumors using High-Density Genotyping Arrays
Chromosomal gains and losses comprise an important type of genetic change in tumors, and can now be assayed using microarray hybridization-based experiments. Most current statistical models for DNA copy number estimate total copy number, which do not distinguish between the underlying quantities of the two inherited chromosomes. This latter information, sometimes called parent specific copy number, is important for identifying allele-specific amplifications and deletions, for quantifying normal cell contamination, and for giving a more complete molecular portrait of the tumor. We propose a stochastic segmentation model for parent-specific DNA copy number in tumor samples, and give an estimation procedure that is computationally efficient and can be applied to data from the current high density genotyping platforms. The proposed method does not require matched normal samples, and can estimate the unknown genotypes simultaneously with the parent specific copy number. The new method is used to analyze 223 glioblastoma samples from the Cancer Genome Atlas (TCGA) project, giving a more comprehensive summary of the copy number events in these samples. Detailed case studies on these samples reveal the additional insights that can be gained from an allele-specific copy number analysis, such as the quantification of fractional gains and losses, the identification of copy neutral loss of heterozygosity, and the characterization of regions of simultaneous changes of both inherited chromosomes.
A singular stochastic control approach for optimal pairs trading with proportional transaction costs
Optimal trading strategies for pairs trading have been studied by models that try to find either optimal shares of stocks by assuming no transaction costs or optimal timing of trading fixed numbers of shares of stocks with transaction costs. To find optimal strategies that determine optimally both trade times and number of shares in a pairs trading process, we use a singular stochastic control approach to study an optimal pairs trading problem with proportional transaction costs. Assuming a cointegrated relationship for a pair of stock log-prices, we consider a portfolio optimization problem that involves dynamic trading strategies with proportional transaction costs. We show that the value function of the control problem is the unique viscosity solution of a nonlinear quasi-variational inequality, which is equivalent to a free boundary problem for the singular stochastic control value function. We then develop a discrete time dynamic programming algorithm to compute the transaction regions, and show the convergence of the discretization scheme. We illustrate our approach with numerical examples and discuss the impact of different parameters on transaction regions. We study the out-of-sample performance in an empirical study that consists of six pairs of U.S. stocks selected from different industry sectors, and demonstrate the efficiency of the optimal strategy.