This is an article that I wrote as part of my work at Belledonne Communications, the company making Linphone.
What you are reading here is the original unabridged version. Click here to read the shorter official version on the Linphone website.
You can also just skip to the code.
OpenGL, Colourspaces, and Linphone
What is OpenGL? What is Y'CbCr ? And why does Linphone need any of this ?
Linphone is a real-time communication application that supports video calls.
To enable your friend to smile at you through their webcam, Linphone relies on a (complex) pipeline that takes in RTP packets from the network and outputs pixels on your screen. (This works in reverse for your friend: their SIP client needs to take input from their webcam and send it through the network.)
In this article, we will focus on the very last part of this pipeline: The video renderer.
The renderer takes its input from the video decoder (H.264, VP8, etc.) and paints its output to a buffer, which is the rectangle you see on your screen (with the smiling face of your friend).
This is where we encounter Y'CbCr and colourspaces. For diverse reasons (some historical), the output of a video decoder is a frame in a Y'CbCr colourspace. Most programmers and digital artists are used to representing images in the RGB colourspace, which is a grid of pixels with three channels: the Red, Green, and Blue channels (and sometimes an Alpha channel for transparency). A Y'CbCr data frame is almost the same, except it encodes colour with three different channels: Luma, Chroma blue, and Chroma red.
At this point, and for details on how this colourspace works, I encourage you to read the summary and the Rationale section of the Wikipedia article on Y'CbCr.
All that is then left for the renderer to do is to translate the dataframe to RGB colourspace, scale it to match the resolution of the output buffer, and draw the result onto that buffer. However, this is a massively parallel process, as it needs to be done for every pixel. If you have a background in computer graphics, you know what I'm getting at: this is a job for your graphics card!
To be able to talk to the graphics card, you need OpenGL.
OpenGL (Open Graphics Library) is a cross-platform library and standard that allows a programmer to control a GPU (graphics processing unit) to carry out tasks such as 2D and 3D rendering. It is well-known in the video game industry, along with DirectX and Metal (equivalent libraries that are not cross-platform), and is now slowly being obsoleted by the new Vulkan standard.
So there you have it: Linphone uses OpenGL to perform the Y'CbCr → RGB translation for video rendering.
Y'CbCr or YUV ?
A note on nomenclature: Throughout this article, I have used the term Y'CbCr, as it is more accurate and is the term used in academia, although the term YUV (which originally was a specific designation for analogue encoding only) is frequently used in the computer industry.
Porting legacy OpenGL 2.1 code to OpenGL 4.1
... Or how I had to reverse engineer our own code, and the importance of documenting said code.
So. Story time. (I will try to keep this short.) Once upon a time, Linphone stopped working on macOS.
The error seemed to indicate that we were using an old version of OpenGL that was not supported anymore. (As it turns out, we weren't asking nicely enough, but that's a different story.) So I set out to investigate, decided the code was small enough to be ported to newer versions of OpenGL, and... down the rabbit hole I went. (Interesting trivia: The video renderer presented in this article is the only bit of OpenGL we have in the whole Linphone code base.) At that point, I had zero knowledge of YUV (not to mention Y'CbCr) but a basic understanding of GLSL (the OpenGL Shader Language).
And this is what I had to port: (All code in this article is protected under the GPLv3, same as the Linphone source code)
#ifdef GL_ES
precision mediump float;
#endif
uniform sampler2D t_texture_y;
uniform sampler2D t_texture_u;
uniform sampler2D t_texture_v;
varying vec2 uvVarying;
void main()
{
float y,u,v,r,g,b, gradx, grady;
y = texture2D(t_texture_y, uvVarying).r;
u = texture2D(t_texture_u, uvVarying).r;
v = texture2D(t_texture_v, uvVarying).r;
y = 1.16438355 * (y - 0.0625);
u = u - 0.5;
v = v - 0.5;
r = clamp(y + 1.596 * v, 0.0, 1.0);
g = clamp(y - 0.391 * u - 0.813 * v, 0.0, 1.0);
b = clamp(y + 2.018 * u, 0.0, 1.0);
gl_FragColor = vec4(r,g,b,1.0);
}
As of January 2022, while I was working on this, that code was 10 years old. No one had touched it since it was written (not a single commit), and the guy who wrote it left the company some 4 years later. If at first glance you have any clue what this is all doing, then either you are that guy (hi!), or you have seen YUV→RGB code before.
Neither was my case, so I had to choose: Should I just try to translate it? Or should I try to understand it as well? For several reasons, I chose the latter (debugging once ported, ease of reimplementation in another graphics API, plain curiosity, ...).
So I searched for "yuv2rgb" (the name of the file) on the web, as well as for some of the magic numbers used like "1.16438355".
I was able to find many examples of code (mostly C) that used similar constants to "convert YUV to RGB" (whatever that meant), but none explained how they were obtained.
I also landed on the YUV Wikipedia article. There I learnt that YUV can actually be formatted in different chroma subsampling, e.g.: YUV 4:4:4, YUV 4:2:0, YUV 4:1:1, etc.
But this only prompted more questions like "which subsampling is this shader supposed to take in?".
The article contained some numbers and formulas but — at first glance — none seemed to match the magic numbers in my shader of interest...
After some more pointless browsing, I was starting to lose hope. That meant one thing: I had to change gears.
Quick digression: In my work as a programmer, I feel like I use two different paces to browse for answers on the Internet.
- A fast, exploratory one, where I skim through contents and jump from search to search,
- and a slow, comprehensive one.
I usually use the first when I know what I'm looking for, or — as was the case here — when I want to determine the feasability of something before investing too much time into it.
That rushy exploratory pace was not yielding satisfactory answers, so I decided I needed something more exhaustive and less exhausting. I breathed in, calmed down, and took the time to investigate and understand my best lead: That YUV Wikipedia article.
This led me to the Y'CbCr article, where everything finally started to make sense.
I had finally found where my magic numbers were coming from: They are the result of simplifying (in the mathematical sense) the inverse conversion matrix from the ITU-R BT.601 standard, mixed with partial range shifting and scaling! Notably, the 1.16438355 factor I mentioned earlier is a rounded version of the 255 / 219 scale factor that can be found in the "ITU-R BT.601 conversion" section of that article.
I know, I know: I don't understand half of this either.
But the point is that my numbers went from magic to scientific. I don't need to understand the physics of phosphor light emission and how Kr, Kg, and Kb were chosen. I just need to know that that's how they were chosen. Then what I do need to know, is that one day we might want to switch to the BT. 709 conversion, and on that day the person doing that change will be very thankful to have everything documented and explicit.
Example code with documentation
So after all my research and effort, this is the final code that we use today in OpenGL 4.1 contexts:
#version 410 core
/* Takes in 3-planar YUV 420 data in partial range 601 colour space
and draws it to an RGB(A) surface.
https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.601_conversion
*/
uniform sampler2D t_texture_y;
uniform sampler2D t_texture_u;
uniform sampler2D t_texture_v;
in vec2 uvVarying;
out vec4 color;
const float Kr = 0.299;
const float Kg = 0.587;
const float Kb = 0.114;
// Pivoted because GLSL stores matrices as column major
const mat3 inverse_color_matrix = mat3(
1. , 1. , 1. ,
0. , - Kb / Kg * (2. - 2. * Kb), 2. - 2. * Kb,
2. - 2. * Kr, - Kr / Kg * (2. - 2. * Kr), 0. );
const vec2 digital_scale = vec2(
255. / 219., // Luma
255. / 224. // Chroma
);
const vec2 digital_shift = digital_scale * -vec2(
16. / 256., // Luma
128. / 256. // Chroma
);
void main()
{
vec3 YCbCr = vec3(
texture(t_texture_y, uvVarying).r,
texture(t_texture_u, uvVarying).r,
texture(t_texture_v, uvVarying).r
);
// Accounting for partial range (and shifting chroma to positive range)
YCbCr = digital_scale.xyy * YCbCr + digital_shift.xyy; // as a MAD operation
color = vec4(inverse_color_matrix * YCbCr, 1.0);
}
I tried to design it so it's easy to follow along with the Wikipedia article opened next to it, and uses explicit naming.
If you are worried about performance, I have benchmarked this new code against the previous version on my machine, and could not measure any meaningful difference between the two. Also, according to OpenGL's documentation, everything above main() should be const expressions (computed only once, at compile time. I.e. when the shader is sent to the GPU).
Post Scriptum
After struggling so much to find those answers, I wanted to write that article in hopes it would help the next poor soul looking to implement an accelerated video renderer. If that is you, I am glad I could help!
The version hosted on the Linphone website went through several steps of review and editing to make it shorter and to tailor it to its intended audience. In doing so, it lost most of my research process, which I thought might still be of interest to some people. So I got permission to upload the original draft here. Although that's not exactly the original draft, as I did take in some feedback from the people helping us with translation, and took the time to read it again to make some minor adjustments.