Testing computer transcription for archive files

We are testing how well automated transcription of files works with archive content. Mixing Sanskrit with English presents a challenge for computer transcription. Here is an unedited transcript of an audio file from the University of Pennsylvania lectures as an example. This was produced using the transcription features integrated with Microsoft Office 365 Word software from an uploaded mp3 file. That transcription service has a file size limit of 300 MB.

rk_sharma_mahabharata_oral_tradition_4february2009.mp3 (12.9 MB, 13:45)
Comments on the tradition of oral poetry in the Mahābhārata.

Begin automated transcript:

Audio file

00:00:03 Speaker 1
And I am required to.
00:00:08 Speaker 1
Tell them a story.
00:00:12 Speaker 1
And that too they want in a.
00:00:15 Speaker 1
Not not in a process, right?
00:00:19 Speaker 1
But in a poetic style, in a musical style.
00:00:24 Speaker 1
So how should I?
00:00:27 Speaker 1
How should I behave?
00:00:28 Speaker 1
What can I do?
00:00:34 Speaker 1
Unfortunately, there was no printing press.
00:00:40 Speaker 1
Of course, the writing was there.
00:00:42 Speaker 1
I have no doubt about it that.
00:00:45 Speaker 1
The scripts were there, writing was there.
00:00:52 Speaker 1
Our forefathers.
00:00:55 Speaker 1
Believed and not only Indian forefathers, but that’s why the tradition also increased.
00:01:02 Speaker 1
Entire block Herblock had one culture.
00:01:06 Speaker 1
That is my feeling.
00:01:09 Speaker 1
What we say Southeast Asia.
00:01:12 Speaker 1
What we say Europe.
00:01:15 Speaker 1
Somewhere they were all living together.
00:01:17 Speaker 1
There is no doubt about it.
00:01:20 Speaker 1
That story will someday it will come out.
00:01:24 Speaker 1
It has not yet fully come out, only a part of it thanks to some script that that story.
00:01:29 Speaker 1
Has come out.
00:01:30 Speaker 1
To some extent.
00:01:32 Speaker 1
For that, we should be grateful to subsequently.
00:01:37 Speaker 1
Because about some script, people believed first of all that decision.
00:01:42 Speaker 1
This is just portal language or something concocted by crafty governments.
00:01:49 Speaker 1
That is the word used by them.
00:01:51 Speaker 1
I committed to memory bilingual the student of via class.
00:01:56 Speaker 1
Thrifty Brahmins I I made with nothing to myself.
00:02:01 Speaker 1
That was their first impression.
00:02:05 Speaker 1
Later on they could.
00:02:07 Speaker 1
William Jones
00:02:09 Speaker 1
Who was The Who was the founder of Royal Asiatic Society?
00:02:13 Speaker 1
Of Bengal.
00:02:15 Speaker 1
He was.
00:02:18 Speaker 1
Scholar of Greek Latin.
00:02:21 Speaker 1
Also Balto, Slavic he knew quite a lot.
00:02:25 Speaker 1
Then when he.
00:02:27 Speaker 1
Came to know something about some script, didn’t know much.
00:02:33 Speaker 1
He said, no, this is a noted language.
00:02:35 Speaker 1
It is a super super language, something like that.
00:02:42 Speaker 1
We should see that anywhere MacDonald’s book or winter next book, everybody quotes him.
00:02:49 Speaker 1
You want.
00:02:51 Speaker 1
Borough has quoted him instance with language Burroughs book.
00:02:55 Speaker 1
You see?
00:02:57 Speaker 1
T Boro Sanskrit language in the title of that book.
00:03:01 Speaker 1
Very good book.
00:03:08 Speaker 1
Then he said that this language is so intimately, not not.
00:03:17 Speaker 1
Not occasionally not, not casually, but, but so scientifically it is connected with all languages of the of Europe.
00:03:30 Speaker 1
And so ** *** saved so much and thereafter.
00:03:35 Speaker 1
Now some cities.
00:03:37 Speaker 1
He thought he was telling Dave, Angie.
00:03:41 Speaker 1
Some St does not belong to Southeast Asia only.
00:03:47 Speaker 1
Sounds great.
00:03:50 Speaker 1
Entire Europe.
00:03:52 Speaker 1
Entire Europe entire Southeast Asia.
00:03:55 Speaker 1
I don’t know.
00:03:55 Speaker 1
In Australia I was, I saw something like Hanuman.
00:04:00 Speaker 1
Some word of the name of this street, something like that.
00:04:04 Speaker 1
So these people were staying together somewhere, somewhere, somehow.
00:04:10 Speaker 1
And then.
00:04:12 Speaker 1
Somehow they were, they thought, separated.
00:04:15 Speaker 1
When they got separated and how they got separated, nobody knows.
00:04:19 Speaker 1
Of course somebody has added one book, Atlantis, that is a that’s a novel.
00:04:27 Speaker 1
That that gives some speculative ideas.
00:04:32 Speaker 1
It gives something.
00:04:34 Speaker 1
Who how this America became?
00:04:38 Speaker 1
Separate from from Europe and out that we don’t know.
00:04:43 Speaker 1
So this mahabharatha?
00:04:48 Speaker 1
Transmitted orally in the beginning.
00:04:51 Speaker 1
So how how to present?
00:04:55 Speaker 1
How to present these stories?
00:04:59 Speaker 1
The stories were already their stories, mostly related to.
00:05:07 Speaker 1
Guardians of communities.
00:05:10 Speaker 1
The sages.
00:05:12 Speaker 1
The leaders, the heroes.
00:05:18 Speaker 1
And they have to be.
00:05:22 Speaker 1
Communicated to a large.
00:05:25 Speaker 1
Gathering of audience, how how to do that?
00:05:30 Speaker 1
So as we learn Alphabet.
00:05:35 Speaker 1
To learn any language or any particular particular community language so they had to learn.
00:05:46 Speaker 1
Quite a lot of poetic formulas.
00:05:49 Speaker 1
They had to master.
00:05:51 Speaker 1
So that’s why, Kim Kuwada, Sanjaya, you will find so many.
00:05:55 Speaker 1
Chimacum, sanjaya.
00:05:57 Speaker 1
Kim Akhuwat pandawa
00:05:59 Speaker 1
And so on so forth.
00:06:01 Speaker 1
And when you see here and you come to.
00:06:06 Speaker 1
The Sunder Kanda of this.
00:06:14 Speaker 1
Here also you will find.
00:06:23 Speaker 1
Yeah, yeah.
00:08:06 Speaker 1
So when.
00:08:08 Speaker 1
When Hanuman goes to.
00:08:12 Speaker 1
Lanka and when you see Sita.
00:08:17 Speaker 1
That description, as you find in.
00:08:25 Speaker 1
The keywords are almost the same here also that we will locate slowly and gradually.
00:08:35 Speaker 1
And the same formula is used.
00:08:41 Speaker 1
Sue diva.
00:08:43 Speaker 1
In the loop back channel.
00:08:46 Speaker 1
As someone goes in search for Sita.
00:08:50 Speaker 1
So Sue Diva was also wondering here, there and everywhere in search for Nala and Damante and he sees the empty in a pilot.
00:09:02 Speaker 1
Serving working as a maidservant.
00:09:08 Speaker 1
So the description of Tamil NT in the palace and description of C to India.
00:09:14 Speaker 1
Ashoka garden.
00:09:16 Speaker 1
Just exactly the same words, the same formula.
00:09:25 Speaker 1
That is the poetic formula for formulas were fixed for.
00:09:31 Speaker 1
Fixed for several.
00:09:34 Speaker 1
Contexts contextual formulas.
00:09:38 Speaker 1
To describe a war, what would be the?
00:09:42 Speaker 1
Mala mala
00:09:43 Speaker 1
Amlan pankajam
00:09:46 Speaker 1
That is, that is a very.
00:09:50 Speaker 1
Very widely used formula for Marla Garland.
00:09:58 Speaker 1
And if you have.
00:10:00 Speaker 1
Rudra taraganj atha.
00:10:06 Speaker 1
If somebody is going on hunting so he will be described as Rudra Taraganj.
00:10:13 Speaker 1
And Kalidasa will bring about some more beauty.
00:10:26 Speaker 1
Pacha me.
00:10:27 Speaker 1
So Pinaki numb he will beautify it further.
00:10:29 Speaker 1
What is the translator?
00:10:32 Speaker 1
Which one?
00:10:34 Speaker 1
From the same point.
00:10:38 Speaker 1
That, of course, the same situation in the sky.
00:10:43 Speaker 1
Why the Rohini nakshatra is there?
00:10:47 Speaker 1
Rohini nakshatra is
00:10:50 Speaker 1
Being chased by rigor Shira nakshatra.
00:10:55 Speaker 1
Henry Garcia is being chased by Ardra nakshatra.
00:10:59 Speaker 1
And Ardra nakshatra presiding deity Lord Shiva.
00:11:04 Speaker 1
So Lord Shiva is hero.
00:11:07 Speaker 1
He’s changing.
00:11:14 Speaker 1
And is following ruining his dear love whatever?
00:11:19 Speaker 1
Whatever it is.
00:11:22 Speaker 1
So Rudra angle so that had been brought in the form of a story in the Puranas.
00:11:31 Speaker 1
It is a better job with a logic device to let us know the astronomical astronomical facts.
00:11:40 Speaker 1
In a in a in a funny funny style.
00:11:44 Speaker 1
Because if you learn it in a funny style, you will never forget it.
00:11:48 Speaker 1
You will remember it all the time.
00:11:51 Speaker 1
So, so the maharatha is the entire variable.
00:11:56 Speaker 1
The presentation through.
00:11:58 Speaker 1
The set sets of poetic formulas, as you will see, slowly and gradually, and in that sometimes once in a while.
00:12:09 Speaker 1
Particular details have also to they they commit blunders.
00:12:14 Speaker 1
Also some eminent critical editors commit blunders.
00:12:18 Speaker 1
That story we will.
00:12:20 Speaker 1
We will tell you later on.
00:12:23 Speaker 1
But the the main the the base or the foundation of that.
00:12:31 Speaker 1
Oral tradition is.
00:12:33 Speaker 1
The repetition.
00:12:35 Speaker 1
Of words, repetition of.
00:12:38 Speaker 1
Similes repetition of adjectives.
00:12:42 Speaker 1
Repetition of vocatives.
00:12:46 Speaker 1
Everywhere you will find that here also you see.
00:12:51 Speaker 1
This Viagra is bandua et cetera, et cetera, et cetera.
00:12:57 Speaker 1
The question atrapada her anti mambety hasanpur.
00:13:03 Speaker 1
That also.
00:13:06 Speaker 1
Swara Krishna in Gita.
00:13:09 Speaker 1
You we have.
00:13:11 Speaker 1
This is several places you have.
00:13:14 Speaker 1
You gave for a shampoo somewhere, it is.
00:13:18 Speaker 1
Treated like that.
00:13:21 Speaker 1
So the point is formulas are there and there is.
00:13:25 Speaker 1
There is a freedom on the part of the oral oral poet to bring about certain certain modifications in that in those formulas.
00:13:37 Speaker 1
And so he goes on bringing about those modifications also.
00:13:44 Speaker 1
So let us see.