Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.

ORCID

https://orcid.org/0000-0003-0247-6830

Access Type

Open Access Thesis

Document Type

thesis

Degree Program

Electrical & Computer Engineering

Degree Type

Master of Science in Electrical and Computer Engineering (M.S.E.C.E.)

Year Degree Awarded

2021

Month Degree Awarded

May

Abstract

Lecture videos are good sources for people to learn new things. Students commonly use online videos to explore various domains. However, some recorded videos are posted on online platforms without being post-processed due to technology and resource limitations. In this work, we focus on the research of developing an intelligent system to automatically extract essential information, including the main instructor and screen, in a lecture video in several scenarios by using modern deep learning techniques. This thesis aims to combine the extracted essential information to render the videos and generate a new layout with a smaller file size than the original one. Another benefit of using this approach is that the users may save video post-processing time and costs. State-of-the-art object detection models, an algorithm to correct screen display, tracking the instructor, and other deep learning techniques were adopted in the system to detect both the main instructor and the screen in given videos without much of the computational burden.

There are four main contributions:

1. We built an intelligent video analysis and post-processing system to extract and reframe detected objects from lecture videos.

2. We proposed a post-processing algorithm to localize the frontal human torso position in processing a sequence of frames in the videos.

3. We proposed a novel deep learning approach to distinguish the main instructor from other instructors or audiences in several complex situations.

4. We proposed an algorithm to extract the four edge points of a screen at the pixel level and correct the screen display in various scenarios.

DOI

https://doi.org/10.7275/22450152.0

First Advisor

Lixin Gao

Second Advisor

Russell Tessier

Third Advisor

Michael Zink

Share

COinS