[MEI-L] Seeking help with Audiveris engine/Tesseract

Tim Crawford T.Crawford at gold.ac.uk
Sat Mar 21 16:04:53 CET 2020


Anna,

My sympathies are with you concerning setting up Tesseract and Audiveris. It seems a bit arcane.

What I did was to install tesseract via VietOCR3, which was developed to recognise Vietnamese script.
https://sourceforge.net/projects/vietocr/

My tesseract setup (v. 4.1.1) is now somewhat strange:

timc$ tesseract --list-langs
Error opening data file /Users/timc/Documents/ocr/VietOCR3/tesseract-ocr/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
List of available languages (5):
tessdata/deu
tessdata/eng
tessdata/lat
tessdata/osd
tessdata/vie

timc$ echo $TESSDATA_PREFIX
/Users/timc/Documents/ocr/VietOCR3/tesseract-ocr/

If this doesn’t help (and it hardly counts as useful instructions, I admit) I have found the main developer of Audiveris (Hervé Bitteur - herve.bitteur at audiveris.com<mailto:herve.bitteur at audiveris.com> ) extremely responsive and helpful in the past. I suspect the same may be true of the developer of VietOCR3, Quan Ngueyen - https://sourceforge.net/u/nguyenq/profile/ , though you will have to contact him/her through SourceForge.

Another admission: I haven’t done anything with this for a year or two. My idea was to set up a system which would take output data from Aruspix concerning location of bits of text, especially lyrics, and feed them as tiny tasks to Tesseract, then merge the recognised lyrics appropriately into the Aruspix MEI. As you might imagine, this is one of those projects that seems a lot simpler before you start, and I only got to the stage of recognising some ‘lyrics’ from 16c motets as text which I was (sometimes) able to identify using *very* approximate matching and a certain amount of manual guesswork with the Liber Usualis.
BTW all this was without any training for the fonts, styles, abbreviations and strange text-glyphs you find in 16c prints.

I think this is a valid Special Interest Sub-group idea for MEI, as it really is the next thing that is needed for corpus-building in early music.

Also, BTW, with a working tesseract installation, Audiveris does a pretty fair job with lyrics in ‘normal’ music.

Tim

Prof. Tim Crawford
Professorial Research Fellow in Computational Musicology
Department of Computing
Goldsmiths College
London SE14 6NW
U.K.

t.crawford at gold.ac.uk<mailto:t.crawford at gold.ac.uk>

On 21 Mar 2020, at 14:05, Kijas, Anna E <Anna.Kijas at tufts.edu<mailto:Anna.Kijas at tufts.edu>> wrote:

Hello all,

I hope that everyone is doing well during this public health crisis. As I am stuck at home for the unforeseeable future I have a bit more time (no more driving to work!). I wanted to build and test out the Audiveris engine on my own machine to see if I can process sheet music and use the OMR to extract musicXML. Has anyone worked or is working with the Audiveris engine to extract music notation? Here is the link to the development guide:https://bacchushlg.gitbooks.io/audiveris-5-1/content/install/sources.html<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbacchushlg.gitbooks.io%2Faudiveris-5-1%2Fcontent%2Finstall%2Fsources.html&data=01%7C01%7Ct.crawford%40gold.ac.uk%7C74d46595e98147adccee08d7cda0ffe8%7C0d431f3f20c1461c958a46b29d4e021b%7C0&sdata=DF5eIMYuf4fQGywtMZt8icBq7lSLjqr3qCNf%2Bm%2BsskA%3D&reserved=0>.

I have built the engine on my machine and have installed the dependencies (JDK 8, Git, Tesseract, FreeType Library), but I am running into an issue with Tesseract. The Audiveris engine requires that you use Tesseract 3.04 language data instead of 4.0 (it won’t work with the newer version). I installed Tesseract on my machine, but when I add the 3.04 language data it doesn’t see it and I keep getting the following error messages, which I believe are because I don’t have Tesseract set up correctly and/or the right version language files.

2020-03-21 10:00:33,027 WARN  [IMSLP273329]            TesseractOrder 166  | Could not initialize Tesseract with lang deu+eng+fra
2020-03-21 10:00:33,031 WARN  [IMSLP273329]                 SheetStub 845  | Error in performing [SCALE, GRID, HEADERS, STEM_SEEDS, BEAMS, LEDGERS, HEADS, STEMS, REDUCTION, CUE_BEAMS, TEXTS, MEASURES, CHORDS, CURVES, SYMBOLS, LINKS, RHYTHMS, PAGE] java.util.concurrent.ExecutionException: java.lang.NullPointerException
java.util.concurrent.ExecutionException: java.lang.NullPointerException

If anyone is able to provide some assistance, please let me know.

Thanks!
Anna

Please note:  Lilly Music Library librarian & staff are working remotely, beginning March 13, 2020, because of COVID-19<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcoronavirus.tufts.edu%2F&data=01%7C01%7Ct.crawford%40gold.ac.uk%7C74d46595e98147adccee08d7cda0ffe8%7C0d431f3f20c1461c958a46b29d4e021b%7C0&sdata=h7M2RIz6d363l3JtetOORNlJ%2FujRw%2BVlT4edl9W6TIo%3D&reserved=0>. Information about library services and support available during this time is available here<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftischlibrary.tufts.edu%2F&data=01%7C01%7Ct.crawford%40gold.ac.uk%7C74d46595e98147adccee08d7cda0ffe8%7C0d431f3f20c1461c958a46b29d4e021b%7C0&sdata=Z%2Fe4Al5jaCI2cSbyeudnZLMJXsFYzxYlqkGvBxnIS08%3D&reserved=0>. Meetings and consultations will be conducted over Zoom.

Anna Kijas
Head, Lilly Music Library
Granoff Music Center
Tufts University
20 Talbot Avenue, Medford, MA 02155
Pronouns: she, her, hers
Book an appointment<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftufts.libcal.com%2Fappointments%2Fkijas%2Flilly&data=01%7C01%7Ct.crawford%40gold.ac.uk%7C74d46595e98147adccee08d7cda0ffe8%7C0d431f3f20c1461c958a46b29d4e021b%7C0&sdata=zbd%2Bjc04ZqzG4OfAprx7hTa%2FvWI6iRejgVCy9n6%2Fk3o%3D&reserved=0> | (617) 627-2846
_______________________________________________
mei-l mailing list
mei-l at lists.uni-paderborn.de<mailto:mei-l at lists.uni-paderborn.de>
https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.uni-paderborn.de%2Fmailman%2Flistinfo%2Fmei-l&data=01%7C01%7Ct.crawford%40gold.ac.uk%7C74d46595e98147adccee08d7cda0ffe8%7C0d431f3f20c1461c958a46b29d4e021b%7C0&sdata=qzl2Zy1gk%2FcFbWKmdSfU0q06osWZ1W1jZCzg5zyiaDI%3D&reserved=0

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.uni-paderborn.de/pipermail/mei-l/attachments/20200321/d7fe30f1/attachment.htm>


More information about the mei-l mailing list