Microsoft Steam: VTT Cleaner with PowerShell

drusome mentioned in a Microsoft TechCommunity contribution, the VTT Cleaner website is missing. This site is mentioned in the Stream Docs under: https://docs.microsoft.com/en-us/stream/portal-add-subtitles-captions#get-just-the-text-from-a-transcript

The VTT cleaner removes all VTT informations from the VTT file and returns the blank text. I think I heard in an Ignite video, that Marc Mroz created this page.

In the TechCommunity contribution is mentioned, how to clean with Excel or to use on other web site. And someone posted the Google Cache link, where I could download the page and view the source. Thean I decided to write a PowerShell script for cleaning a VTT file. It is hosted in my GitHub under: https://github.com/tomka75/Tools/tree/master/MicrosoftStream/VTTCleaner

What does the script do? A VTT file from Microsoft Stream has the following format:

WEBVTT

NOTE duration:“00:00:29.5060000″

NOTE language:en-us

NOTE Confidence: 0.50167996

2f1aed67-1122-4899-a3a5-922b6e5b4d36
00:00:12.200 –> 00:00:13.080
Hi Microsoft Stream.

NOTE Confidence: 0.6553465

4644c530-e8eb-49be-88c5-d2931662ded5
00:00:16.270 –> 00:00:21.895
Welcome to the show
today on this

NOTE Confidence: 0.6553465

b2bdeca5-304b-4447-8fa5-a556bedf717c
00:00:21.895 –> 00:00:22.520
station.

NOTE Confidence: 0.5076108

6e006136-c996-4745-bb65-975f276dd09b
00:00:23.760 –> 00:00:27.400
We are in a tenant
with sausages.

Call the script with:

.\Clean-MicrosoftStreamVttFile.ps1 -File MicrosoftStreamVideoTranscriptDemo.vtt

Results in a new Output file named MicrosoftStreamVideoTranscriptDemo.vttcleaned with this content:

Hi Microsoft Stream. Welcome to the show today on this station. We are in a tenant with sausages.

It is only the text from the VTT file witout other informations. Try it! Give me feedback!