Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Reading LevelReading Level
-
Content TypeContent Type
-
YearFrom:-To:
-
More FiltersMore FiltersItem TypeIs Full-Text AvailableSubjectPublisherSourceDonorLanguagePlace of PublicationContributorsLocation
Done
Filters
Reset
111
result(s) for
"Luk, Wayne"
Sort by:
Learning spatial hearing via innate mechanisms
2025
The acoustic cues used by humans and other animals to localise sounds are subtle, and change throughout our lifetime. This means that we need to constantly relearn or recalibrate our sound localisation circuit. This is often thought of as a “supervised” learning process where a “teacher” (for example, a parent, or your visual system) tells you whether or not you guessed the location correctly, and you use this information to update your localiser. However, there is not always an obvious teacher (for example in babies or blind people). Using computational models, we showed that approximate feedback from a simple innate circuit, such as that can distinguish left from right (e.g. the auditory orienting response), is sufficient to learn an accurate full-range sound localiser. Moreover, using this mechanism in addition to supervised learning can more robustly maintain the adaptive neural representation. We find several possible neural mechanisms that could underlie this type of learning, and hypothesise that multiple mechanisms may be present and provide examples in which these mechanisms can interact with each other. We conclude that when studying spatial hearing, we should not assume that the only source of learning is from the visual system or other supervisory signals. Further study of the proposed mechanisms could allow us to design better rehabilitation programmes to accelerate relearning/recalibration of spatial hearing.
Journal Article
ProtoPGTN: A Scalable Prototype-Based Gated Transformer Network for Interpretable Time Series Classification
2025
Time Series Classification (TSC) plays a crucial role in machine learning applications across domains such as healthcare, finance, and industrial systems. In these domains, TSC requires accurate predictions and reliable explanations, as misclassifications may lead to severe consequences. In addition, scalability issues, including training time and memory consumption, are critical for practice usage. To address these challenges, we propose ProtoPGTN, a prototype-based interpretable framework that unifies gated transformers with prototype reasoning for scalable time series classification. Unlike existing prototype-based interpretable TSC models which rely on recurrent structure for sequence processing and Euclidean distance for similarity computation, ProtoPGTN adapts Gated Transformer Networks (GTN), which uses an attention mechanism to capture both temporal and spatial long-range dependencies in time series data and integrates the prototype learning framework from ProtoPNet with cosine similarity to enhance metric consistency and interpretability. Extensive experiments are conducted on 165 publicly available datasets from the UCR and UEA repositories, covering both univariate and multivariate tasks. Results show that ProtoPGTN obtains at least the same performance as existing prototype-based interpretable models on both multivariate and univariate datasets. The average accuracy on multivariate and univariate datasets stands at 67.69% and 76.99%, respectively. ProtoPGTN achieves up to 20× faster training and up to 200× lower memory consumption than existing prototype-based interpretable models.
Journal Article
Scalable Time Series Causal Discovery with Approximate Causal Ordering
2025
Causal discovery in time series data presents a significant computational challenge. Standard algorithms are often prohibitively expensive for datasets with many variables or samples. This study introduces and validates a heuristic approximation of the VarLiNGAM algorithm to address this scalability problem. The standard VarLiNGAM method relies on an iterative refinement procedure for causal ordering that is computationally expensive. Our heuristic modifies this procedure by omitting the iterative refinement. This change permits a one-time precomputation of all necessary statistical values. The algorithmic modification reduces the time complexity of VarLiNGAM from O(m3n) to O(m2n+m3) while keeping the space complexity at O(m2), where m is the number of variables and n is the number of samples. While an approximation, our approach retains VarLiNGAM’s essential structure and empirical reliability. On large-scale financial data with up to 400 variables, our algorithm achieves up to a 13.36× speedup over the standard implementation and an approximate 4.5× speedup over a GPU-accelerated version. Evaluations across medical time series analysis, IT service monitoring, and finance demonstrate the heuristic’s robustness and practical scalability. This work offers a validated balance between computational efficiency and discovery quality, making large-scale causal analysis feasible on personal computers.
Journal Article
Acceleration of a Deep Neural Network for the Compact Muon Solenoid
by
Tapper, Alex
,
Barbone, Marco
,
Bainbridge, Robert
in
Algorithms
,
Artificial neural networks
,
Design optimization
2024
There are ongoing efforts to investigate theories that aim to explain the current shortcomings of the Standard Model of particle physics. One such effort is the Long-Lived Particle Jet Tagging Algorithm, based on a DNN (Deep Neural Network), which is used to search for exotic new particles. This paper describes two novel optimisations in the design of this DNN, suitable for implementation on an FPGA-based accelerator. The first involves the adoption of cyclic random access memories and the reuse of multiply-accumulate operations. The second involves storing matrices distributed over many RAM memories with elements grouped by index. An evaluation of the proposed methods and hardware architectures is also included. The proposed optimisations can yield performance enhancements by more than an order of magnitude compared to software implementations. The innovations can also lead to smaller FPGA footprints and accordingly reduce power consumption, allowing for instance duplication of compute units to achieve increases in effective throughput.
Journal Article
A Real-Time Tree Crown Detection Approach for Large-Scale Remote Sensing Images on FPGAs
by
Dong, Runmin
,
Xia, Maocai
,
Luk, Wayne
in
Algorithms
,
Artificial intelligence
,
Computer applications
2019
The on-board real-time tree crown detection from high-resolution remote sensing images is beneficial for avoiding the delay between data acquisition and processing, reducing the quantity of data transmission from the satellite to the ground, monitoring the growing condition of individual trees, and discovering the damage of trees as early as possible, etc. Existing high performance platform based tree crown detection studies either focus on processing images in a small size or suffer from high power consumption or slow processing speed. In this paper, we propose the first FPGA-based real-time tree crown detection approach for large-scale satellite images. A pipelined-friendly and resource-economic tree crown detection algorithm (PF-TCD) is designed through reconstructing and modifying the workflow of the original algorithm into three computational kernels on FPGAs. Compared with the well-optimized software implementation of the original algorithm on an Intel 12-core CPU, our proposed PF-TCD obtains the speedup of 18.75 times for a satellite image with a size of 12,188 × 12,576 pixels without reducing the detection accuracy. The image processing time for the large-scale remote sensing image is only 0.33 s, which satisfies the requirements of the on-board real-time data processing on satellites.
Journal Article
Distributed large-scale graph processing on FPGAs
by
Sahebi, Amin
,
Procaccini, Marco
,
Barbone, Marco
in
Apexes
,
Big Data
,
Central processing units
2023
Processing large-scale graphs is challenging due to the nature of the computation that causes irregular memory access patterns. Managing such irregular accesses may cause significant performance degradation on both CPUs and GPUs. Thus, recent research trends propose graph processing acceleration with Field-Programmable Gate Arrays (FPGA). FPGAs are programmable hardware devices that can be fully customised to perform specific tasks in a highly parallel and efficient manner. However, FPGAs have a limited amount of on-chip memory that cannot fit the entire graph. Due to the limited device memory size, data needs to be repeatedly transferred to and from the FPGA on-chip memory, which makes data transfer time dominate over the computation time. A possible way to overcome the FPGA accelerators’ resource limitation is to engage a multi-FPGA distributed architecture and use an efficient partitioning scheme. Such a scheme aims to increase data locality and minimise communication between different partitions. This work proposes an FPGA processing engine that overlaps, hides and customises all data transfers so that the FPGA accelerator is fully utilised. This engine is integrated into a framework for using FPGA clusters and is able to use an offline partitioning method to facilitate the distribution of large-scale graphs. The proposed framework uses Hadoop at a higher level to map a graph to the underlying hardware platform. The higher layer of computation is responsible for gathering the blocks of data that have been pre-processed and stored on the host’s file system and distribute to a lower layer of computation made of FPGAs. We show how graph partitioning combined with an FPGA architecture will lead to high performance, even when the graph has Millions of vertices and Billions of edges. In the case of the PageRank algorithm, widely used for ranking the importance of nodes in a graph, compared to state-of-the-art CPU and GPU solutions, our implementation is the fastest, achieving a speedup of 13 compared to 8 and 3 respectively. Moreover, in the case of the large-scale graphs, the GPU solution fails due to memory limitations while the CPU solution achieves a speedup of 12 compared to the 26x achieved by our FPGA solution. Other state-of-the-art FPGA solutions are 28 times slower than our proposed solution. When the size of a graph limits the performance of a single FPGA device, our performance model shows that using multi-FPGAs in a distributed system can further improve the performance by about 12x. This highlights our implementation efficiency for large datasets not fitting in the on-chip memory of a hardware device.
Journal Article
Event‐based high throughput computing: A series of case studies on a massively parallel softcore machine
by
Beaumont, Jonathan
,
Luk, Wayne
,
McLachlan Bragg, Graeme
in
Communication
,
Field programmable gate arrays
,
Neural networks
2023
This paper introduces an event‐based computing paradigm, where workers only perform computation in response to external stimuli (events). This approach is best employed on hardware with many thousands of smaller compute cores with a fast, low‐latency interconnect, as opposed to traditional computers with fewer and faster cores. Event‐based computing is timely because it provides an alternative to traditional big computing, which suffers from immense infrastructural and power costs. This paper presents four case study applications, where an event‐based computing approach finds solutions to orders of magnitude more quickly than the equivalent traditional big compute approach, including problems in computational chemistry and condensed matter physics.
Journal Article
GeDi: applying suffix arrays to increase the repertoire of detectable SNVs in tumour genomes
2020
Background
Current popular variant calling pipelines rely on the mapping coordinates of each input read to a reference genome in order to detect variants. Since reads deriving from variant loci that diverge in sequence substantially from the reference are often assigned incorrect mapping coordinates, variant calling pipelines that rely on mapping coordinates can exhibit reduced sensitivity.
Results
In this work we present GeDi, a suffix array-based somatic single nucleotide variant (SNV) calling algorithm that does not rely on read mapping coordinates to detect SNVs and is therefore capable of reference-free and mapping-free SNV detection. GeDi executes with practical runtime and memory resource requirements, is capable of SNV detection at very low allele frequency (<1%), and detects SNVs with high sensitivity at complex variant loci, dramatically outperforming MuTect, a well-established pipeline.
Conclusion
By designing novel suffix-array based SNV calling methods, we have developed a practical SNV calling software, GeDi, that can characterise SNVs at complex variant loci and at low allele frequency thus increasing the repertoire of detectable SNVs in tumour genomes. We expect GeDi to find use cases in targeted-deep sequencing analysis, and to serve as a replacement and improvement over previous suffix-array based SNV calling methods.
Journal Article
Ultrafast jet classification at the HL-LHC
by
Summers, Sioni
,
Kasieczka, Gregor
,
Luk, Wayne
in
Algorithms
,
Classification
,
Field programmable gate arrays
2024
Three machine learning models are used to perform jet origin classification. These models are optimized for deployment on a field-programmable gate array device. In this context, we demonstrate how latency and resource consumption scale with the input size and choice of algorithm. Moreover, the models proposed here are designed to work on the type of data and under the foreseen conditions at the CERN large hadron collider during its high-luminosity phase. Through quantization-aware training and efficient synthetization for a specific field programmable gate array, we show that O ( 100 ) ns inference of complex architectures such as Deep Sets and Interaction Networks is feasible at a relatively low computational resource cost.
Journal Article