Rigging Magic | Ishayu S. Shikhare

View code Read report View proposal Project updates

Overview

A Magic 8 Ball is a deceptively simple object: a die suspended in fluid inside a sealed sphere. Shake it, and the fluid sloshes, the die tumbles, and eventually one face settles against the viewing window. Reproducing that motion faithfully is a coupled physics problem—thousands of fluid particles interacting with each other, the rigid die, and the spherical container—at a scale where brute-force simulation is far too slow for real-time use.

This project, with Aidan Vogt, builds that simulator using smoothed particle hydrodynamics (SPH) on the CPU first, then ports the full timestep pipeline to CUDA so each particle update can run in parallel on the GPU. The fluid uses Poly6, spiky, and viscosity kernels for density, pressure, and viscous forces; wall ghost particles provide boundary support near the sphere; and a neutrally buoyant rigid cube die is coupled two-way through contact and drag, with adaptive time stepping driven by acoustic, force, and viscous CFL limits.

The hard part is not the physics formulas—it is making neighbor search and force accumulation fast enough to matter. Naive all-pairs search is (O(N^2)); we use uniform spatial hashing (later geometric binning on GPU) so each particle only checks nearby cells. On the GPU, die reaction forces require reductions across threads, and early versions paid heavily for host–device transfers every step. Keeping state on the device, switching from Thrust to pre-allocated CUB sorts, fusing kernels, and eliminating redundant checks in neighbor-list construction were the main levers that took us from modest speedups to a ~96× steady-state improvement over the serial CPU baseline at our target problem size (~3,500 particles in a realistic 8-ball geometry).

We hit our real-time goal: the GPU simulator sustains 60 Hz physics stepping. Rendering was the next bottleneck—the matplotlib visualizer could not keep up—so we added a lightweight C++ OpenGL fast viewer that streams SIM2 frames for live remote preview. The CPU and GPU builds share the same binary frame format, so behavior can be compared and debugged offline.

If you want the full pipeline breakdown, profiling tables, and optimization ablation, the final report has the complete details. The proposal and mid-project update trace how the plan evolved from a CPU-first schedule to a CUDA-focused finish.