Section I: What Is RAW?
RAW footage is just pure ones and zeroes aka code. It’s your camera not discarding any important image information when the sensor data is being saved to the memory card or SSD; I’m talking about image detail such as the grooves in tree bark and skin pores. That kind of detail adds more depth to your footage. RAW video comes in two forms: compressed and uncompressed, and compressed isn’t always bad. Images still need to be compressed in order to be workable in post because the data rates are so high. We’ll return to image compression of RAW data on digital cameras later down the page.
What else is RAW footage? It’s highly flexible in post. A major benefit this flexible footage provides is that the white balance can be altered in post with the scroll of a mouse or stroke of a key; the user also has the option of choosing the color and gamma space as well. The ISO can be manipulated, but that does not excuse the camera operator from properly exposing the sensor on set; by underexposing the sensor, the cinematographer will create problems down the post production pipeline such as noise reduction, which accumulates a lot of processing power and time.
Finally, RAW footage is huge, and it takes up a ton of storage space. Two aspects that make the data rates so high are the lack of Chroma Subsampling and the increasing Color Bit Depth.
Section I: Review
RAW Data- Pure sensor information, which is composed of 1s and 0s; no compression. It’s unreadable to the human eye and must be compressed into readable video format such as Log.
Section II: Chroma Subsampling
One of reasons cinema camera owners love RAW is because of its lack of chroma subsampling, but not all camera owners have the storage space or an absolute need for two-gigabyte-per-minute video files; in light of that fact, compression wizards discovered that if color detail were to be detracted from an image, the file size would decrease substantially, and the image sharpness would remain perfectly intact.
Here’s the formula they created: J:a:b. The letter J is the luminance (light) – the top and bottom row of four pixels. The letters a (top row) and b (bottom row) are both chrominance (color). Now, I want you to imagine these two rows of four pixels, or please look at the image above for visual reference. In order for the user to acquire the best color rendering possible, each pixel should retain its individuality… it’s own chroma (color) sample; the pixels should never be combining with one another. They’re like greedy siblings; they each want their own colorfully painted room. What chroma subsampling does is it forces the pixels to share a room in order to save space on the memory card… discarding color information. For example, the codec (compress decompress) ProRes 4:4:4 (top left of the photo), all of the pixels retain their individuality, while in ProRes 422, every two pixels are sharing a chroma subsample with each other. DLSRs chroma subsampling is a 4:2:0… the worst of the subsampling categories.
The reason why 4:4:4 is good for post production is because it provides colorists a lot of leeway when they are performing extreme color grades. If the colorist were to attempt pulling of an intense color grade with 4:2:0 footage, banding lines would be created due to the fact that there is simply not enough color information in the image, so you’ll end up seeing the physical graduations in the tonal shifts like the image below in the next section. The banding lines are simply the computer attempting to create colors that do not exist; therefore, the limited color range is stretched out in an attempt to mimic the colors that are missing in the image. Another problem that occurs from 4:2:0 footage for the colorist is that nasty, block artifacts will pop up when he or she tries to pull a chroma key, which is an isolation of a specific color, because a lot of the pixel are being forced to share colors; there’s simply not enough color information in the image. Finally, 4:2:0 footage can be a huge hassle for the VFX team that’s working with green screen and has to perform compositing work (inserting objects into the scene that weren’t there or removing objects from the scene like snow from a mountain or even an actor).
Section III: Color Bit Depth
There are two different types of color bit depth: bits per channel and bits per pixel. Bits per channel (bpc) refers to the amount of color information that can be displayed in each individual red, green and blue channel. Bits per pixel (bpp) is the sum of all three channels mixed into a single pixel; remember, a pixel needs the primary red, green and blue color channels in order to exist.
But, in order to understand it all, we have to go back to the base system. The Romans said, “We have 10 fingers, so let’s make our counting system based on 10!” And, the base 10 system was born. Now, imagine you have 8 vertical rectangles stacked next to each other that can be filled with information 0 through 9, and on top of those numbers, each number from right to left is a multiple of 10. So, you start with 10, and it goes up to 100, 1,000 and so on; then, you multiply the number in the vertical rectangle by the number on top of it and add that with the number next to it, which has also been multiplied by the representing number above it. You’d literally have trillions of numbers to choose from.
A computer can read only in 1s and 0s. It’s binary – a base 2 system. Imagine that each channel can read only 8 bits of information; that provides the channel with 8 vertical rectangles that can be filled with either 0s or 1s. Instead multiplying by a multiple of x10, the progression of the numbers on top of the vertical rectangles are now increasing by a multiple of x2. So, the numbers will equate to: 1, 2, 4, 8, 16, 32, 64 and 128, but since it’s read from left to right, descending, it’s actually 128, 64, 32, 16, 8, 4, 2, 1; when you add them all together, you will come to a value of 255, which means you have 255 shades from black to white to choose from. each RGB channel can produce a number from 0 to 255.
All three channels don’t need to equal the number 255 because all three could theoretically be the number 255, which would equal pure white (R:255, G:255, B:255 = pure white). Basically, each channel has only 255 values to choose from. So, R:25, G:50 and B: 239 will create a really deep blue. In binary 8 bit, this is what the channels would read as: R: 00011001, G: 00110010 and B: 11101111, and that’s it. But, what about 10, 12 and 16 bit? You simply add on two squares (bits) to the already-existing 8 and so on; you always add on by two.
So, because the RGB channels are using a system of 255 (256 with the number 0 included) for an 8 bit color depth, this means that the pixel itself can only produce a maximum of 255 tonal values as well. In case you’re interested, the formula for bits per pixel is (2 to the x power), and the formula for bits per channel is (2 to the 3rd power times x). Remember, the more bits that are available per channel, the more bits there’ll be per pixel because the pixel is the summation of the three channels, and the tones will end up looking smoother/more realistic to the human eye. No nasty banding lines will occur in dark scenes. The bit depth needs to remain high when the image is captured so more color information can be used in post.
Chroma subsampling and bit depth are often confused as the same entity, but they are most certainly different. Chroma subsampling and color bit depth are both performed in camera, but when the final picture is delivered to multiple formats such as movie theaters, broadcasting stations and streaming services, bit depth is still controllable and can be reduced; the usual bit depth for broadcast and streaming is 8 bit. When both are utilized to their maximum abilities inside the camera, you (the artist) can create some pretty and aesthetically powerful imagery with a ton of control inside your editing suite.
Section II & III Review: Chroma Subsampling and Color Bit Depth
Chroma subsampling – The digital discarding of color information, in camera, by lumping pixels together color-wise, which lowers the file size significantly. This loss of color fidelity cannot be fixed in post. The computer cannot represent that which does not exist.
Color Bit Depth- The amount of color values that can be displayed per color channel/per pixel. The lower the color bit depth, the more color banding is evident to the human eye.
NOTE: The higher the bit depth you capture to the memory card, the more tones within the color space you’ll have to work with in post production. You can’t add to the tonal range in post because you can’t reproduce/add tones that weren’t recorded in the first place. If you record with a bit depth of 16, you’ll have 65,536 different shades from white to black per pixel to work with in post, while with a bit depth of eight, you’ll have only 256 shades from black to white.
Section IV: Image Compression
Ok, so back to image compression of RAW images. Repeating from the beginning of the article, RAW is basically ones and zeroes; it’s simply not readable to the human eye and looks like a histogram with vague, shape-like outlines. What’s important to note is that some form of compression is necessary in the post cycle; the image file needs to be placed inside a codec (compressed/decompressed form of organized data), and the codec needs to be made readable via a container (presentation of organized data). With companies like RED, Sony and BlackMagic, the image is converted into a codec/container from the get go.
Using RED as an example, RED decided to use wavelet compression, a ramping system of rezzed images from 1/16 up to full quality (Lossy or Lossless Compression depending on the compression rate one chooses), in order to provide more room for information on their cards and provide their user base (those who don’t have high-speed computers) with the option of immediate down rezzing while viewing footage; for example, a 4k image can be viewed at 1/4th, 1/8th and even 1/16th of the original resolution without the need to physically down-sample the footage because all of the downsampling already exists within the footage file. The lowest compression the RED can provide is 3:1 compression. Should RED make pure, uncompressed footage available to its user base even though 3:1 is a mathematically lossless (decompressed output is bit-by-bit identical to the original input) solution? It’s not necessary but always debatable.
With the BlackMagic cinema camera line, the case is a little different. The file folder for each clip is broken up into a bunch of single-image .dng (container) files (ones that if opened, open straight to Photoshop) that can, when combined together, create an entire clip. This is a dangerous way of storing files because if a single file is lost, the entire clip is useless, but it still works really well for what it does.
ARRIRAW on the other hand, is much like a film negative (it’s uncompressed aka Lossless Compression) and has to be processed into a codec/container in order to be readable. This is Arri’s way of staying true to its film community, while moving into a digital era; the great part about this is that the user can compress the image into whatever compression format he or she chooses after the image is captured even though the file size is huge; it’s a trade off really.
Section IV Review: Compression
Even if the footage is fully uncompressed RAW, it still needs to be compressed in some form so it can be readable/viewable to the human eye. All cameras compress and store their RAW files uniquely.
Section V: What is Log?
Log is not RAW. What is log? Log is short for logarithmic; it’s also a type of video format/contrast curve. This logarithmic image curve, which looks like an S with a reversed belly being pulled upward, was created back in the early ’90s by Kodak and was named the Cineon Curve. It was utilized to transfer film negatives into a digital space and allowed the digital scanning technicians to more faithfully reproduce the dynamic range from the film negatives. The curve itself has 1024 steps that are all represented by a density increase of 0.002; if you look in DaVinci Resolve, you’ll notice the waveform histogram ranges from 0 all the way to 1023 (1024 steps).
On the S curve, all of the important detail is stored in the middle, which is pulled upward to the left, while the shadows (toe of the curve) remain neutral toward middle gray to preserve shadow detail, and the highlights (shoulder of the curve) are pulled down slightly in order to preserve a little more of the highlight detail that would otherwise be lost if the image were to be processed in a linear fashion. Log images are formed inside the camera after the image has already been captured, which is recorded linearly. To not be confusing, with a cinema camera, you capture the image in a linear fashion, and it’s saved logarithmically in the SSD. With a DSLR, you expose for the image in a linear fashion, and the image is saved linearly in the memory card.
All of RED’s, ARRI’s, BlackMagic’s and Sony’s images are built on a base Log layer. Anything else you see on top of the image, color, contrast and saturation, are part of the color space/gamma curve that have both been applied on top of the base Log layer by the company. So, that gray, flat image you see in post production from a BlackMagic, RED or Arri is a Log image; basically, you’re camera’s working just fine.
Section VI: Terminologies
Note: I’ll add more terminologies in the future.
RAW Footage (DSLR or Camcorder)
Highly compressed footage that has come straight out of the camera (non-edited). It has a color bit depth of 8 and a chroma subsampling of 4:2:0. It’s hard to manipulate in post and has already been compressed, losing important image detail, when being saved to the memory card and is not true RAW.
RAW Footage (ARRI, RED, BLACKMAGIC, SONY)
Slightly compressed or fully uncompressed footage with high color bit depth (12 and up), no chroma subsampling and preserved image detail. The color and gamma space are changeable along with color temperature and ISO. It’s highly manipulatable and can be pushed to extremes in color grading.
The act of discarding color information by lumping pixels together in order to lower the file size.
Linear encoding system used to encode luma values into the footage. Turns flat footage into normal-looking footage.
An abstract mathematical model that describes the color values to the footage. Some spaces such as SRGB (Internet), REC.709 (Broadcast ), and DCI P3 (Movie Theaters) are used to conform footage.
Compression where image information is lost.
Compression where no information is lost.