Abstract:
Traditional video mashup and summarization methods assume that all video clips have common
audio, but with varying quality. Hence, selecting the
best quality audio is sufficient. In this work we explore
a new scenario in which a single person plays each
instrument one by one, leading to multi-view video
clips, but each video clip having only partial audio,
e.g. a single instrument or vocal. To get the complete
audio, we need to merge all partial audios. In this
way, although the videos are recorded at different
times, they correspond to a common timeline in the
final mashup. The proposed framework automatically
recognizes the type of instrument for a given audio
clip and employs instrument specific enhancements
before merging. To select video segments for the final
mashup, the framework automatically recognizes the
dominant instrument from given audio clips and selects
the video capturing that instrument. The proposed
framework enables a single artist to play instruments,
add vocals, and act in the mashup video. The complete
framework has been implemented in the form of an
android application.