GPU_programming

1

How I learned Vulkan and wrote a small game engine with it (edw.is)

submitted 4 months ago by dragontamer@lemmy.world to c/gpu_programming@lemmy.world

0 comments fedilink

Currently getting Hacker News hug of death right now, but hopefully in a few days the traffic subsides. From what I could load, it looks like a good article.

https://web.archive.org/web/20240606103630/https://edw.is/learning-vulkan/

Archive.org does have a mirror.

2

1

Porting HPC Applications to AMD Instinct MI300A Using Unified Memory and OpenMP (arxiv.org)

submitted 5 months ago by dragontamer@lemmy.world to c/gpu_programming@lemmy.world

0 comments fedilink

3

1

Inside the Snapdragon 855’s iGPU (chipsandcheese.com)

submitted 5 months ago by dragontamer@lemmy.world to c/gpu_programming@lemmy.world

0 comments fedilink

4

1

Learning DirectX 12 (www.3dgep.com)

submitted 6 months ago by dragontamer@lemmy.world to c/gpu_programming@lemmy.world

0 comments fedilink

Sorry for the spam, just did a bit of a research / search and decided I needed to save at least these DirectX12 links and read up on them later.

5

1

Learning DirectX 12 in 2023 (whoisryosuke.com)

submitted 6 months ago by dragontamer@lemmy.world to c/gpu_programming@lemmy.world

0 comments fedilink

6

1

Alex Tardif: Using D3D12 in 2022 (alextardif.com)

submitted 6 months ago by dragontamer@lemmy.world to c/gpu_programming@lemmy.world

0 comments fedilink

7

1

GitHub: Use Direct3D 12 Compute Shader in C (Basic) (github.com)

submitted 6 months ago by dragontamer@lemmy.world to c/gpu_programming@lemmy.world

0 comments fedilink

Just searching for stuff, and this came up. I figure it was worth saving here.

8

1

Geohot's AMD 7900xtx research (github.com)

submitted 7 months ago by dragontamer@lemmy.world to c/gpu_programming@lemmy.world

0 comments fedilink

9

1

HLSL Shader Model 6.8 (microsoft.github.io)

submitted 7 months ago by dragontamer@lemmy.world to c/gpu_programming@lemmy.world

0 comments fedilink

Looks like the update's big feature is "Work Graphs".

10

1

How do I become a graphics programmer? - A small guide from the AMD Game Engineering team (gpuopen.com)

submitted 7 months ago by dragontamer@lemmy.world to c/gpu_programming@lemmy.world

0 comments fedilink

11

1

Shader Printf in HLSL and DX12 (therealmjp.github.io)

submitted 9 months ago by dragontamer@lemmy.world to c/gpu_programming@lemmy.world

0 comments fedilink

Someone wanted a more portable printf implementation, so they created one for themselves.

I'll be giving this article a good look over for sure.

12

1

Mesh Shaders on RDNA™ Graphics Cards (gpuopen.com)

submitted 10 months ago by dragontamer@lemmy.world to c/gpu_programming@lemmy.world

0 comments fedilink

13

1

Efficiently Processing Large Relational Joins on GPUs (arxiv.org)

submitted 10 months ago* (last edited 10 months ago) by dragontamer@lemmy.world to c/gpu_programming@lemmy.world

0 comments fedilink

Saving this .pdf here.

The relational join operator is a very memory-intensive and even computationally-intensive operation. Though real-life databases can be in the TB range, there are a number of applications of smaller, memory-only databases that could feasibly fit in the 4GB or 8GB of smaller GPUs.

Its a well known fact that relational-joins (and joins-of-joins) can be parallelized. Database programmers meticulously perform planning-algorithms to optimize this important operation and parallelize it across cores or even systems. Seeing research into a natural GPU application warms my heart at least!

GPUs are well known to parallelize and improve upon sorting algorithms (see embarrassingly parallel solutions like Bitonic Sort... but also GPU-specific / SIMD-designed sorting algorithms like MergePath). One of the most common ways to perform a relational join is to sort both sets of data on the relational-join, and then linearly scan through both relations matching up (left.blah == right.blah). This paper seems to take this approach and measures how good GPUs are at this. (At least, for data that does fit in the GPU RAM).

There's also "Hash-Join", which is investigated in this paper as well.

14

1

Designing a SIMD Algorithm from Scratch · mcyoung (mcyoung.xyz)

submitted 11 months ago by dragontamer@lemmy.world to c/gpu_programming@lemmy.world

0 comments fedilink

15

1

Solving MaxSAT with Matrix Multiplication (hgpu.org)

submitted 11 months ago by dragontamer@lemmy.world to c/gpu_programming@lemmy.world

0 comments fedilink

The abstract stuck out to me, and I like dabbling in the 3SAT stuff on a hobby level.

The gist is that these researchers have utilized the TensorCores / FP16 Matrix Multiplication routines found in neural-network chips/instructions to start searching for MaxSAT (which seems to be related to 3SAT somehow, I'll be reading more about this...)

16

1

The Book of Shaders (thebookofshaders.com)

submitted 1 year ago by dragontamer@lemmy.world to c/gpu_programming@lemmy.world

0 comments fedilink

Found this reference online, figured I'd save it here. Looks like an excellent introduction to fragment/pixel shaders.

17

1

AMD Enables ROCm and PyTorch on Radeon RX 7900 XTX (www.tomshardware.com)

submitted 1 year ago by AlmightySnoo@lemmy.world to c/gpu_programming@lemmy.world

0 comments fedilink

18

1

Compiling NumPy code into C++ or CUDA via torch.compile (pytorch.org)

submitted 1 year ago by AlmightySnoo@lemmy.world to c/gpu_programming@lemmy.world

0 comments fedilink

19

1

Implementing a GPU's Programming Model on a CPU (litherum.blogspot.com)

submitted 1 year ago by dragontamer@lemmy.world to c/gpu_programming@lemmy.world

0 comments fedilink

Seems like a personal project that is basically a personal-version of the Intel "ispc" tool. Still, 2nd or 3rd programming languages of this nature isn't a bad thing, if anything, we need more ideas and more implementations to figure out how best to map GPU-like programming to AVX512 or other CPU SIMD languages.

20

1

ROCm Is AMD’s No. 1 Priority, Exec Says (www.eetimes.com)

submitted 1 year ago by dragontamer@lemmy.world to c/gpu_programming@lemmy.world

0 comments fedilink

21

1

Analyzing Starfield’s Performance on Nvidia’s 4090 and AMD’s 7900 XTX (chipsandcheese.com)

submitted 1 year ago by dragontamer@lemmy.world to c/gpu_programming@lemmy.world

0 comments fedilink

A solid example of how to perform performance analysis on modern GPUs and video games.

DirectX programmers (and GPU programmers of all kinds) probably should use this article as a template for thinking about GPU-performance in relationship to a greater task.

You search to find the "slowest shader", you analyze it and its parallelism to see if its adequate. You hone in and perform deeper analysis. I guess a programmer goes one step further and needs to think of a solution / improvement though (rather than "just" benchmarking).

But this style of analysis is very useful and helpful in the optimization process.

22

1

WebGL shader examples (webgl-shaders.com)

submitted 1 year ago by dragontamer@lemmy.world to c/gpu_programming@lemmy.world

0 comments fedilink

23

1

Simplifying GPU Application Development with Heterogeneous Memory Management | NVIDIA Technical Blog (developer.nvidia.com)

submitted 1 year ago by dragontamer@lemmy.world to c/gpu_programming@lemmy.world

0 comments fedilink

24

1

N-Queens project from over 10 years ago - StreamHPC (streamhpc.com)

submitted 1 year ago by dragontamer@lemmy.world to c/gpu_programming@lemmy.world

0 comments fedilink

StreamHPC here goes over some concepts to solve N-Queens on a GPU.

N-Queens is a classic "homework problem" for traditional AI courses (back when search algorithms were considered AI at least). As usual, the GPU has a couple of changes if you want to run as fast as possible.

25

1

Christian Mills - Testing Intel’s Arc A770 GPU for Deep Learning Pt. 2 (christianjmills.com)

submitted 1 year ago by dragontamer@lemmy.world to c/gpu_programming@lemmy.world

0 comments fedilink