![]() ![]() The mistakenly labelled mistakes all contain upper case letters.It detected correctly the two sponsored segments (Ting segments) but labelled content as sponsor with a high confidence level. I took my machine for a spring and you can find the raw results here or on this spreadsheet. So here’s what popped up in my recommendations :, a video by Linus Tech Tips. The next step was to try and label automatically a random video with labels to see what would come out of it. You can find the notebook I ran over here : (It’s really raw and undocumented though) The model yielded 93.79% testing accuracy ! (if the ad was 3 min long, the content training example was also 3 min long) To have a balanced dataset, I took equal parts sponsor and content from the videos. To make things reliable, I chose for my training ads longer than 10s and shorter than 5 minutes, while making sure to keep only the videos only had one ad. (took about 35 hours on 100 mb/s internet) To build the dataset, I still downloaded the videos with youtube-dl, except this time I fetched the automatic captions when they were available, giving me about 80k examples. Once I went through it, I decided to go for the Transformer model which is based on BERT a NLP framework developed by Google. I also looked around NLP (natural language processing) tutorials and found this excellent repo about sentiment analysis. In the meantime I learnt about SponsorBlock which made everything easier. SO, in order to get more “information” with less heavy data, I finally went with the transcripts ! The captions route Problem was it was getting super tough to handle the data : the dataset for 300 podcasts was about 50 gigs :( I figured that I simply needed to train the model on multiple different podcasts. When inputting different podcasts, the model wasn’t able to detect commercials with confidence, outputting less than 60% confidence on predictions.When learning on a single podcast, the model showed 99% accuracy on the training set and 95% on the test set which means the model was able to detect commercial portions accurately on 95% of the episodes the network didn’t see.I followed the instructions, learned how to use youtube-dl to download podcasts highlights, and scraped Radiocentre’s commercials database because until then, I hadn’t learned about SponsorBlock. In my case i only needed two classes (sponsor and not sponsor) It was used to successfully detect 12 different guitar effects at a 99.7% accuracy, making 11 mistakes on 4000 testing examples ! Without knowing anything about the subject, I searched around for audio classification algorithms, and found my way on the Panotti repo, which is based on a CNN (Convolutional Neural Network). I felt like audios would have much more data than simple text (music, cadence.), so I started down that road. Make a model learning on the transcripts.I didn’t know where to start, but I knew I had two paths: ![]() Is it possible to detect sponsored content from Youtube videos ? Their SQL database is open and updated often, so i used it to answer my next question : You get access to every single labeled video, with the start and end time of the sponsors. What’s really cool about SponsorBlock is that their database is completely open ! With this extension, you will automatically skip YouTube sponsors.You can also label yourself content for other users to skip if the video isn’t already labeled. This allows you to appear on the leaderboard and helps determine reputation of submissions. "Authentication Information": When you install the extension, it will generate a random "userID" that is used when submitting or voting. This is open source and the entire database is public.Īccess your data for, - Used to modify the YouTube webpage You can also skip over non music sections of music videos. Once one person submits this information, everyone else with this extension will skip right over the sponsored segment. SponsorBlock is a crowdsourced browser extension that let's anyone submit the start and end time's of sponsored segments and other segments of YouTube videos. SponsorBlock lets you skip over sponsors, intros, outros, subscription reminders, and other annoying parts of YouTube videos. Use Adguard along with this extension to block both types. THIS EXTENSION DOES NOT BLOCK NORMAL YOUTUBE ADS, ONLY SPONSORS INSIDE VIDEOS. ![]()
0 Comments
Leave a Reply. |