Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
448
result(s) for
"Yang, Greg"
Sort by:
A formal notion of genericity and term-by-term vanishing superpotentials at supersymmetric vacua from R-symmetric Wess-Zumino models
by
Yang, Greg
,
Sun, Zheng
,
Brister, James
in
Classical and Quantum Gravitation
,
Elementary Particles
,
Global Symmetries
2021
A
bstract
It is known in previous literature that if a Wess-Zumino model with an R-symmetry gives a supersymmetric vacuum, the superpotential vanishes at the vacuum. In this work, we establish a formal notion of genericity, and show that if the R-symmetric superpotential has generic coefficients, the superpotential vanishes term-by-term at a supersymmetric vacuum. This result constrains the form of the superpotential which leads to a supersymmetric vacuum. It may contribute to a refined classification of R-symmetric Wess-Zumino models, and find applications in string constructions of vacua with small superpotentials. A similar result for a scalar potential system with a scaling symmetry is discussed.
Journal Article
Do financial innovations influence bank performance? Evidence from China
2024
Purpose
The rapid growth of Fintech presents a growing challenge for banking institutions, particularly those with more traditional, service backgrounds. This paper aims to examine the relationship between Fintech innovation and bank performance by exploiting novel Chinese market data.
Design/methodology/approach
Guided by the work of Dietrich and Wanzenried (2011, 2014) and Phan et al. (2019), the authors construct a regression model to investigate the effect of Fintech innovation on the profitability of Chinese listed banks. The authors include their measures of Fintech innovation in each of their selected structures.
Findings
Results indicate that Fintech innovation is negatively associated with bank performance and that state-owned banks, joint-stock commercial banks and long-established banks are more negatively impacted by Fintech innovation relative to city and rural commercial banks and younger banks.
Originality/value
Risk tolerance levels, internal structure and efficiency and recent debt repayment performance channels are each shown to be significant, robust explanatory factors underpinning such results.
Journal Article
The dynamics of price discovery for cross-listed stocks evidence from US and Chinese markets
by
Scrimgeour, Frank
,
Duppati, Geeta
,
Hou, Yang
in
American Depositary Receipts
,
Bias
,
Causality
2017
Purpose: This study examines how, and to what extent the trading of the cross-listed China-backed ADRs on the New York Stock Exchange (NYSE) contributes to the information flow and price discovery for the corresponding cross-listed stocks on the Shanghai Stock exchange (SSE). Design/methodology/approach: The study utilizes the information share, Granger causality test, Vector error correction model, Permanent-Temporary Gonzalo-Granger (PT/GG) method and Bivariate DCC-EGARCH model to examine the price discovery dynamics across the cross-listed stocks. Findings: The Granger causality tests show that there is two-way transmission on feedback between the Chinese and US markets. The effects from NYSE to SSE are larger than the other way round. The Bivariate DCC-EGARCH model test results indicate the volatility spill over from NYSE is larger from the SSE. Practical implications: Results suggest that in contrast to previous studies that showed very little contribution to price discovery by Chinese ADRs on the NYSE, the present study indicates that the contribution to price-discovery of Chinese ADRs on NYSE has increased relative to the past, suggesting the importance of changing time frames and economic situations. Originality/value: The study differentiates between long-term and short-term price discovery effects and finds that home country bias persists in the long term and in the short term the information from the Cross-listed China-backed ADRs on the New York Stock Exchange (NYSE) affects price discovery for SSE stocks.
Journal Article
Computability of validity and satisfiability in probability logics over finite and countable models
2015
The
-logic (which is called
E-logic in this paper) of Terwijn is a variant of first-order logic (FOL) with the same syntax in which the models are equipped with probability measures and the
quantifier is interpreted as 'there exists a set A of a measure
such that for each
, ...'. Previously, Kuyper and Terwijn proved that the general satisfiability and validity problems for this logic are, i) for rational
, respectively
-complete and
-hard, and ii) for
, respectively decidable and
-complete. The adjective 'general' here means 'uniformly over all languages'. We extend these results in the scenario of finite models. In particular, we show that the problems of satisfiability and validity with respect to finite models in
E-logic are, i) for rational
, respectively
-complete and
-complete, and ii) for
, respectively decidable and
-complete. Although partial results toward the countable case are also achieved, the computability of
E-logic over countable models still remains largely unsolved. In addition, most of the results here and of Kuyper and Terwijn do not apply to individual languages with a finite number of unary predicates. Reducing this requirement continues to be a major point of research. On the positive side, we derive the decidability of the corresponding problems for monadic relational languages - equality- and function-free languages with finitely-many unary and arbitrarily-many nullary predicates. This result holds for all three of the unrestricted, countable, and finite-model cases. Applications in computational learning theory (CLT), weighted graphs, and artificial neural networks (ANNs) are discussed in the context of these decidability and undecidability results.
Journal Article
Tensor Programs III: Neural Matrix Laws
2021
In a neural network (NN), *weight matrices* linearly transform inputs into *preactivations* that are then transformed nonlinearly into *activations*. A typical NN interleaves multitudes of such linear and nonlinear transforms to express complex functions. Thus, the (pre-)activations depend on the weights in an intricate manner. We show that, surprisingly, (pre-)activations of a randomly initialized NN become *independent* from the weights as the NN's widths tend to infinity, in the sense of asymptotic freeness in random matrix theory. We call this the Free Independence Principle (FIP), which has these consequences: 1) It rigorously justifies the calculation of asymptotic Jacobian singular value distribution of an NN in Pennington et al. [36,37], essential for training ultra-deep NNs [48]. 2) It gives a new justification of gradient independence assumption used for calculating the Neural Tangent Kernel of a neural network. FIP and these results hold for any neural architecture. We show FIP by proving a Master Theorem for any Tensor Program, as introduced in Yang [50,51], generalizing the Master Theorems proved in those works. As warmup demonstrations of this new Master Theorem, we give new proofs of the semicircle and Marchenko-Pastur laws, which benchmarks our framework against these fundamental mathematical results.
Tensor Programs I: Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes
2021
Wide neural networks with random weights and biases are Gaussian processes, as originally observed by Neal (1995) and more recently by Lee et al. (2018) and Matthews et al. (2018) for deep fully-connected networks, as well as by Novak et al. (2019) and Garriga-Alonso et al. (2019) for deep convolutional networks. We show that this Neural Network-Gaussian Process correspondence surprisingly extends to all modern feedforward or recurrent neural networks composed of multilayer perceptron, RNNs (e.g. LSTMs, GRUs), (nD or graph) convolution, pooling, skip connection, attention, batch normalization, and/or layer normalization. More generally, we introduce a language for expressing neural network computations, and our result encompasses all such expressible neural networks. This work serves as a tutorial on the *tensor programs* technique formulated in Yang (2019) and elucidates the Gaussian Process results obtained there. We provide open-source implementations of the Gaussian Process kernels of simple RNN, GRU, transformer, and batchnorm+ReLU network at github.com/thegregyang/GP4A.
Tensor Programs II: Neural Tangent Kernel for Any Architecture
2020
We prove that a randomly initialized neural network of *any architecture* has its Tangent Kernel (NTK) converge to a deterministic limit, as the network widths tend to infinity. We demonstrate how to calculate this limit. In prior literature, the heuristic study of neural network gradients often assumes every weight matrix used in forward propagation is independent from its transpose used in backpropagation (Schoenholz et al. 2017). This is known as the *gradient independence assumption (GIA)*. We identify a commonly satisfied condition, which we call *Simple GIA Check*, such that the NTK limit calculation based on GIA is correct. Conversely, when Simple GIA Check fails, we show GIA can result in wrong answers. Our material here presents the NTK results of Yang (2019a) in a friendly manner and showcases the *tensor programs* technique for understanding wide neural networks. We provide reference implementations of infinite-width NTKs of recurrent neural network, transformer, and batch normalization at https://github.com/thegregyang/NTK4A.
Width and Depth Limits Commute in Residual Networks
2023
We show that taking the width and depth to infinity in a deep neural network with skip connections, when branches are scaled by \\(1/depth\\) (the only nontrivial scaling), result in the same covariance structure no matter how that limit is taken. This explains why the standard infinite-width-then-depth approach provides practical insights even for networks with depth of the same order as width. We also demonstrate that the pre-activations, in this case, have Gaussian distributions which has direct applications in Bayesian deep learning. We conduct extensive simulations that show an excellent match with our theoretical findings.
Tensor Programs IVb: Adaptive Optimization in the Infinite-Width Limit
2023
Going beyond stochastic gradient descent (SGD), what new phenomena emerge in wide neural networks trained by adaptive optimizers like Adam? Here we show: The same dichotomy between feature learning and kernel behaviors (as in SGD) holds for general optimizers as well, including Adam -- albeit with a nonlinear notion of \"kernel.\" We derive the corresponding \"neural tangent\" and \"maximal update\" limits for any architecture. Two foundational advances underlie the above results: 1) A new Tensor Program language, NEXORT, that can express how adaptive optimizers process gradients into updates. 2) The introduction of bra-ket notation to drastically simplify expressions and calculations in Tensor Programs. This work summarizes and generalizes all previous results in the Tensor Programs series of papers.
Scaling Limits of Wide Neural Networks with Weight Sharing: Gaussian Process Behavior, Gradient Independence, and Neural Tangent Kernel Derivation
2020
Several recent trends in machine learning theory and practice, from the design of state-of-the-art Gaussian Process to the convergence analysis of deep neural nets (DNNs) under stochastic gradient descent (SGD), have found it fruitful to study wide random neural networks. Central to these approaches are certain scaling limits of such networks. We unify these results by introducing a notion of a straightline tensor program that can express most neural network computations, and we characterize its scaling limit when its tensors are large and randomized. From our framework follows (1) the convergence of random neural networks to Gaussian processes for architectures such as recurrent neural networks, convolutional neural networks, residual networks, attention, and any combination thereof, with or without batch normalization; (2) conditions under which the gradient independence assumption -- that weights in backpropagation can be assumed to be independent from weights in the forward pass -- leads to correct computation of gradient dynamics, and corrections when it does not; (3) the convergence of the Neural Tangent Kernel, a recently proposed kernel used to predict training dynamics of neural networks under gradient descent, at initialization for all architectures in (1) without batch normalization. Mathematically, our framework is general enough to rederive classical random matrix results such as the semicircle and the Marchenko-Pastur laws, as well as recent results in neural network Jacobian singular values. We hope our work opens a way toward design of even stronger Gaussian Processes, initialization schemes to avoid gradient explosion/vanishing, and deeper understanding of SGD dynamics in modern architectures.