## Running Grants

# Grant opportunity

The GPU-Lab wishes to provide an opportunity for researchers to produce academic output with the Lab's monetal, infrastructural and technical support. Applications must always aim on publishing the achieved results.

# Application conditions

The application must contain a CV with emphasis on the scientific field knowledge and the programming experiences and a detailed plan of the proposed research project in not more than 2 pages detailing the following points:

```
- Project title and abstract in English
- Short introduction to the scientific problem
- Weekly plan breakdown
- CPU/GPU time and development/user support needs
- Knowledge and experience in programming languages and parallel computing technologies
- Publication and other scientific outcome of the project
```

# Running grants

## Analysis of Gravitational Waves on LIGO/VIRGO data

##### Balázs Kacskovics

##### Supervisor: Mátyás Vasúth

**Abstract**: With the notification of GW150914 in February of 2016, the first time we observed
gravitational waves. After the first observation in 2015 October and December we
could detect two other event. We got a chance to analyse a one week amount of
data from the LIGO at the and of this year from the observation which starts this
Autumn. For the runnings, testing and perhaps for the development we want to use
the computer clusters of the LIGO-Virgo Collaboration and the Wigner institute.

## Viscous corrections from linearized Boltzmann transport

##### Dénes Molnár

**Abstract**: The calculation of self-consistent shear viscous corrections closely follows the standard determination of shear viscosity from covariant Boltzmann kinetic theory. The collision term is linearized in the deviation from local thermal equilibrium for each particle species, while the free streaming term is taken at zeroth order in the deviation, i.e., p · ∂f → p · ∂feq. This leads to an integral equation for the shear corrections. It can be shown that for each species i the corrections can be reduced to a one-dimensional function χi(|p|) that only depends on the magnitude of the particle momentum in the ﬂuid rest frame. The integral equation turns out to be equivalent to a variational problem for a functional that is quadratic in the χi, which then becomes a linear algebra maximization problem when the χi are expanded over some ﬁnite basis of suitable one-dimensional functions. The computational diﬃculty stems from the need to evaluate certain 4D integrals for each variational matrix element, i.e., each possible pair of particle species and basis functions, which means tens of thousands or more of such integrals. Also, unlike the calculation of the shear viscosity, which only needs the maximum value of the functional, we need the full momentum distribution of particles, i.e., an accurate determination of the actual χi that maximize the functional.

Zack Wolﬀ and I wrote a code in C++ that performs these calculations via nesting adaptive 1D interation routines from the GNU Scientiﬁc Library (GSL). GSL provides high-order routines that, for smooth functions, provide much faster convergence than 1/√N for Monte Carlo integration even for our 4D problem. Still, the calculations took a lot of cluster computing time, and with energy dependent cross sections they will take at least one order of magnitude longer. This is why I want to investigate whether GPUs are more suitable for this problem.

The primary goal is to port our current single-CPU shear viscous code to GPUs. On one hand, this is useful because everything can be conveniently cross-checked against the current code, including any change of integrators, if necessary. On the other hand, while the bulk viscous case has not been worked out analytically yet, it will lead to 4D integrals that are very similar to the shear viscous case (only the kernel will change). Therefore, once the shear viscous calculations are adapted to GPUs, bulk viscous corrections will be straightforward to code.

The ﬁrst step will be to investigate whether the adaptive 1D integrators in GSL can be ported to OpenCL. If that turns out to be infeasible or ineﬃcient, I will move on to other high-order integrators (Gauss-Kronrod, Gauss-Laguerre, etc) or give up adaptivity and use a ﬁxed high number of points so that one can run eﬃciently on GPUs. The next step will be to increase the number of dimensions steadily up to full 4D, via nesting 1D integrations. Finally, I will port the shear viscous integration kernels, test the port against the current CPU-based code extensively, and try to optimize the code on GPUs as much as possible.

## Parallelized Transport and Corrections to Equilibrium Phase Space Distributions

##### Mridula Damodaran

**Abstract**: When modeling a heavy ion collision, hydrodynamics is not applicable in the regime where deviations from local thermal equilibrium are not guaranteed to be small. Further, the fact that experiments detect particles warrants a switch from a ﬂuid dynamical picture to a particle picture. One approach to modeling particles with non-equilibrium dynamics utilizes the relativistic Boltzmann Transport Equation (BTE). The BTE describes the evolution of particle phase space distributions via collision terms for various scattering processes. While the elastic 2 → 2 collision terms are useful to study the approach to thermal equilibrium, the radiative 2 ↔ 3 collision terms are necessary to study the approach to chemical equilibrium in systems with changing particle number.

The existing MPC/Grid code solves the BTE by discretizing space into cells on a grid, and simulating propagation and interactions of point particles within each cell. Within a single time step in the evolution of the system, every particle in a given cell is equally likely to interact with every other particle in the cell. Within each cell in the grid, the code loops through all possible pairs and triplets and generates a random number for each. The random number is compared to a collision probability in order for a decision to be made as to whether or not a collision occurs. The 2 → 3 collision probability is determined via a 2D integral for each pair. This integral takes up a signiﬁcant amount of calculation time, because it needs to be carried out serially for every single pair in each cell in the system.

The grid nature of the algorithm requires high enough particles per cell and small enough time steps in order to obtain realistic results. Increasing the number of test particles is computationally expensive, especially with the current algorithm for 2 → 3 collision checks. To improve the feasibility of simulations with large numbers of test particles, we created a parallelized version of this code using Message Passing Interface (MPI). This version divides the spatial grid into “subsystems” that are allocated to diﬀerent cores.

While the MPI parallelization did result in a speed up, the integrals for collision checks are still carried out in serial loops within each subsystem. The 2D integral that is carried out for every single pair presents an opportunity for parallelization using GPUs. We want to investigate the computational eﬃciency of such a parallelization. A speed up via GPUs would allow for complex systems to be studied within reasonable amounts of time.

The ﬁrst application of such a parallelized code would be to expand an existing study. We have already studied the evolution of deviations from equilibrium in phase space distributions in a 1D expanding system. We compared this evolution to that predicted by various models based purely on hydrodynamic ﬁelds. This study was limited to 2 → 2 interactions only. Around 300 million particles were required in order for the results to be trusted. Including 2 ↔ 3 interactions and transverse expansion would describe the behavior of a more realistic system without particle number conservation. However, these eﬀects are expected to require even higher particle statistics, thus calling for further speed ups in MPC/Grid and motivating us to attempt a speed up via GPUs.

## Numerical Studies of Lattice Loop Equations in Pure Gauge Theory

##### Peter D. Anderson

**Abstract**: Monte Carlo lattice simulations of pure Yang-Mills theory, such as the one proposed here, have been shown to be ideally suited for GPU computations since all changes in the action are local (as opposed to dynamical fermions that are non-local in the lattice simulation). The action is written in terms of link variables and only depends on nearest neighbors. Thus any lattice site with even (or odd) parity can be run simultaneously. The link is an element of the gauge group SU(NC) and acts as a parallel transporter from one site to the next site. A Wilson Loop is deﬁned as the trace of a product of links associated with a closed path in the lattice. Its expectation value is measured by averaging over a large number of statistically independent conﬁgurations that are obtained in the simulation. After such independent conﬁgurations are obtained and saved, the Wilson loops averages can be measured oﬀ-line.

Currently, we have implemented the Monte Carlo simulation for NC = 3 using CUDA on two Tesla C2070 GPUs using the Wilson Action, which is the simplest action that reproduces the correct continuum limit. Each link is updated by multplying it by a matrix randomly chosen from a pool of randomly generated matrices in the gauge group. These updates are accepted or rejected via the Metropolis algorithm to minimize the action. To optimize parallelizability, the updates are performed several times at each link to bring it into a state of “thermal equilibrium” with its neighboring links. We have performed calculations of glueball mass spectrums which took advantage of improved actions that involved the next smallest Wilson Loop in the action as well.

We wish to extend our studies to larger NC and larger lattice sizes so we can determine the region where the loop equations are numerically satisﬁed and where large-NC simpliﬁcations are applicable. Since the loop equations are the Schwinger-Dyson equations for the Wilson Loops, they are dependent on the form of the action and need to be adjusted accordingly. However, we should ﬁnd that they still hold. In order to investigate the phase space, we need access to a larger number of GPUs than we currently have. Our goal is to rewrite the existing CUDA code for NC = 3 in OpenCL and allow for generic NC. In particular we would like to verify the approach to large-NC behavior at NC = 8.

Finally we would like to emphasize that the loop equations are theoretically exact equations, valid for any value of NC and for the lattice, independently of the continuum limit. Therefore they are an ideal tool to evaluate the Monte Carlo algorithm.

## Optimalization and Development of High-performance Computing Pipeline to Search for Gravitational Radiation from Rotating Neutron Stars by Means of GPU-based Hardware Accelerators

##### Michał Bejger

**Abstract**: The aim of this project is to develop a production-ready version of the data-analysis pipeline to search for gravitational-wave signals from the network of Advanced Era LIGO and Virgo interferometric detectors. The algorithm developed by the Polish Virgo-POLGRAW group aims at finding almost-monochromatic gravitational-wave signals from rotating, non-axisymmetric, isolated neutron stars. The detection of such signals will open an exciting possibility of studying the physics of neutron-stars’ interiors, its elastic properties and structure of the crust. Joint project within the Hungarian high-performance computing experts and gravitational-wave experts will be beneficial for both sides and will initiate long-term collaboration in this field.