Find out more about past and upcoming POV films!

Did you know? Many POV films are available free to stream online and on mobile devices. Visit POV's Watch Video page or download the PBS mobile app for iPhone or iPad, and see which films are available today!

Smart Cropping for Video: A Tool for Displaying Video at Any Aspect Ratio

by |

Smart-Cropped Video displayed on a 5.33:1 screen at the IFP Made in NY Media Center in Brooklyn, NY. (Photo: Colleen Cambier)

Smart-Cropped Video displayed on a 5.33:1 screen at the IFP Made in NY Media Center in Brooklyn, NY. View the demo. | View source code. (Photo: Colleen Cambier)

View the demo »
View source code »

Since the beginning of cinema, film and video have been composed with a fixed frame — theatrical films are now often presented at 1.85:1, while television is broadcast at 16:9 for high-definition or 4:3 for standard definition. But today, video is likely to be seen on device with unusual aspect ratios or multiple aspect ratios when rotated or resized — some Android tablets are 8:5 (or 5:8), the iPhone is 3:2 (or 2:3), and desktop browser windows can be of any dimension.

So far, there have been two approaches to handling the problem of multiple aspect ratios:

1. Crop the image or employ “pan-and-scan“. The original authors may not be involved in re-composing the frame for each platform.

2. Apply a letterbox or pillarbox, i.e. black bars to the top or side of the frame. The original image composition is preserved, but not all of the display device is used and the subject of the image may appear smaller than intended.

While these approaches mostly worked for cinema on television, for the web it is impractical to create multiple videos, and the negative effects of letterboxing are magnified on mobile devices held vertically or in tall and narrow browser windows. For interactive video, the problems multiply when a producer has to contend with player controls, menus and text, which can become unusably small at non-optimal dimensions.

Smart Video Cropping

On the left, a pillarboxed video playing on an iPad. On the right, a video on the same device with an intelligent crop, guided by the filmmaker. Still images are from Cutie & the Boxer, directed by Zachary Heinzerling. © Ex Lion Tamer Inc.

Click to view the demo. On the left, a pillarboxed video playing on an iPad. On the right, a video on the same device with an intelligent crop, guided by the filmmaker. Still images are from Cutie & the Boxer, directed by Zachary Heinzerling. © Ex Lion Tamer Inc.

I sought to create a new solution that allows video to fill any screen or window, while preserving the video creator’s intended composition of the frame and the full view of the subject. The tool applies equally well to highly cinematic documentaries, cinéma vérité approaches, broadcast journalism and “talking-head” interviews.

The experiment pairs web video with a series of timeline-bound rectangles that define the minimum required area of a shot or sequence. As the video plays, the frame is cropped to fill the display area while guaranteeing that the selected subject remains in frame, as close to the original composition as possible. In extremely wide or tall displays, minimal letterboxing or pillarboxing may be used to preserve large subject areas.

You can view the demo or the source code. Read on to find out how you can try out the tool on your own video or contribute to future iterations of the tool.

The Technology and Other Notes About This Tool’s Creation

I was able to implement the smart cropping example in a few days with HTML video, CSS and Javascript. The video element will play an MP4 or a WebM video file and can be scaled or positioned just like any other element. The use of CSS transforms signals the browser to use the graphics hardware for fast and smooth scaling. There are a few modest Javascript modules that work together — one to load the rectangle data, one to implement custom play controls and one to monitor window size and device orientation and scale the video. I used Popcorn.js to switch to the right data for each shot in the video at the right time. Thanks to all of these open technologies, the final result works in all major, modern browsers on both desktop and mobile devices.

The video responds well to changes in window size and device orientation, keeping the subject in view and preserving much of the intended composition, especially in shots with one subject and lots of extra room. Some shots still required some letterboxing or pillarboxing if the area of interest took up most of the frame, but the black bars are much smaller than with the traditional approach. Using the keyframing built into Popcorn Base Plugin, I was able to animate a pan for a long, continuous shot in which the subject moved across the frame. A very slow pan is subtle enough to simulate a tracking shot. If the pan is too fast, the lack of depth and parallax motion is obvious and ugly, reminiscent of a bad pan-and-scan effect.

I decided to add a minimum and maximum aspect ratio feature, allowing a cut to other parts of the frame or zooming out to a master shot only when the window is very tall or very wide. If one subject is on the left of the frame and a secondary subject is all the way on the right, it might make sense to cut between them but only if the two areas don’t overlap. But if the window is close to the original aspect ratio, we can avoid a cut that wouldn’t make sense and stick to the “30 degree rule.”

This form of responsive video is not without its problems and challenges, in both technical process and the outcome. The aggressive cropping doesn’t accomplish much for some scenes, especially when the subject takes up the whole frame. When it’s necessary to preserve the black bars, it can be worse if the size of the bars varies from one shot to the next. I suspect the approach will be most effective when it is part of the plan when the video is first shot. The lack of a decent graphical authoring tool at this stage makes defining the areas of interest difficult and tedious. I had to manually type exact pixel coordinates into a data file, repeatedly reloading the web page to refine the results.

There are a few technical limits as well, though they can likely be improved with time. Animated movement within a shot can be choppy on certain mobile browsers. Hopefully this will be less of a problem as more mobile devices are built with powerful GPUs and if Google can manage to improve uniformity of Android distributions. Unfortunately, there isn’t a way to make this work in a browser on an iPhone, which will only play a video at fixed full screen. The video would have to be embedded in an app, perhaps with PhoneGap. Lastly, the HTML video element is occasionally one frame behind the reported time. This is a problem at cuts between shots, when the video shifts to a new position a fraction of a second before it shows the first frame of that shot. More research is required to determine whether this can be addressed without a change to the browser’s video API.

The key to the smart cropping tool is the filmmakers' ability to select a rectangle that defines a "minimum viewable area" in every shot of a web video.

The key to the smart cropping tool is the filmmakers’ ability to select a rectangle that defines a “minimum viewable area” in every shot of a web video. Still images are from Cutie & the Boxer, directed by Zachary Heinzerling. © Ex Lion Tamer Inc.

Make Your Own

The code for this experiment is released with an open source license, so you can try the smart cropping tool for your own video. The HTML includes a video element that you can point to your MP4 or WebM files. There is a data file that specifies the scene parameters for each shot. Every shot has start and end times, with four pixel coordinates of the rectangle that contains the subject.

Visit GitHub for the source code and more detailed instructions. Let me know if you post an example! Use the hashtag #povtech, or leave a comment below.

The Future

There are a number of opportunities for further experimentation and development. A graphical interface for editing areas of interest along a video timeline and saving the data would make the authoring process easier and allow for faster iteration. The selective range of aspect ratios is helpful, but there is room for more advanced rules for which areas are triggered at different sizes and aspect ratios — It might be useful to have multiple nested areas for increasingly extreme aspect ratios. It might also be useful to have rules based on absolute pixel dimensions — A target area may need to be magnified on low-resolution devices to be clear, such as a face or text. Finally, I’d like to see this applied to a variety of videos and interactive scenarios.

Please let me know what you think of this tool. Did you try it? Share a link. You can comment below, use the hashtag #povtech or email us at filmmakers@pov.org.

Thanks again to Zachary Heinzerling for granting permission to showcase the smart cropping video tool with a clip from his Academy Award-nominated documentary, Cutie & the Boxer, which is scheduled to air on POV in the spring.

Get more documentary film news and features: Subscribe to POV’s documentary blog, like POV on Facebook or follow us on Twitter @povdocs!

Brian Chirls
Brian Chirls is the Digital Technology Fellow at POV, developing digital tools for documentary filmmakers, journalists and other nonfiction media-makers. The position is a first for POV and is funded as part of a $250,000 grant from John S. and James L. Knight Foundation. Follow him on Twitter @bchirls and GitHub.