04 — YUV on the GPU

Tutorial 03 let the module convert each frame to RGBA on the CPU. This is the faster route: borrow the decoder’s native YUV420p planes (no conversion, no copy), upload them to three GL_R8 textures, and do the YUV→RGB in a fragment shader. The CPU does nothing but hand over pointers, and ~2.7× less data crosses the bus (1.5 bytes/pixel instead of 4).

The borrow

The YUV get_data overload hands you three planes — Y at full resolution, U (cb) and V (cr) at quarter resolution. Each is packed at its own stride (video_y_stride / video_uv_stride), which is macroblock-rounded and so may exceed the display width; GL_UNPACK_ROW_LENGTH handles that on upload:

        player |> get_data() $(y, u, v : array<uint8>#) {
            glPixelStorei(GL_UNPACK_ALIGNMENT, 1)
            glActiveTexture(GL_TEXTURE0)
            glBindTexture(GL_TEXTURE_2D, texY)
            glPixelStorei(GL_UNPACK_ROW_LENGTH, ys)
            glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, vw, vh, GL_RED, GL_UNSIGNED_BYTE, unsafe(addr(y[0])))
            glActiveTexture(GL_TEXTURE1)
            glBindTexture(GL_TEXTURE_2D, texU)
            glPixelStorei(GL_UNPACK_ROW_LENGTH, us)
            glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, vw / 2, vh / 2, GL_RED, GL_UNSIGNED_BYTE, unsafe(addr(u[0])))
            glActiveTexture(GL_TEXTURE2)
            glBindTexture(GL_TEXTURE_2D, texV)
            glPixelStorei(GL_UNPACK_ROW_LENGTH, us)
            glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, vw / 2, vh / 2, GL_RED, GL_UNSIGNED_BYTE, unsafe(addr(v[0])))
            glPixelStorei(GL_UNPACK_ROW_LENGTH, 0)
        }

The three samplers are bound to distinct texture units with @stage = 0/1/2 (miss this and every sampler reads unit 0 — the chroma samples the luma, and you get a green/magenta image). The fragment shader applies the BT.601 studio-swing matrix, matching pl_mpeg’s own plm_frame_to_rgb so the output is identical to the RGBA path, byte for byte.

Self-verifying

A headless test confirms the plane shapes the GPU path relies on — Y full-res, U/V quarter-res — without a display:

require video
require daslib/defer

[export]
def main() {
    var p = video_open("tutorials/04_yuv_shader/sample.mpg")
    verify(p != null, "the sample opens")
    defer() { video_close(p) }
    verify(video_decode(p), "decodes a frame")
    let h = video_height(p)
    p |> get_data() $(y, u, v : array<uint8>#) {
        verify(length(y) == video_y_stride(p) * h, "Y plane is y_stride x height (full-res)")
        verify(length(u) == video_uv_stride(p) * (h / 2), "U plane is uv_stride x height/2 (quarter-res)")
        verify(length(v) == length(u), "V plane matches U")
    }
    print("tutorial 04 ok: {video_width(p)}x{h}, y_stride {video_y_stride(p)}, uv_stride {video_uv_stride(p)}\n")
}

Run it

cd <dasVideo>
daslang -load_module . examples/play_yuv_gl.das -- \
    --video tutorials/04_yuv_shader/narration.mpg

Next

05 — Sound, in sync adds the last piece: sound, and keeping the picture in step with it.