Linguistics Services
Supporting Military Linguistics Translation Mission.
For more than 25 years, ARTI has developed and managed linguistic data to construct translation algorithms, test systems for Machine Translation (MT), provide Optical Character Recognition (OCR), and produce information extraction in multiple languages. Today, we infuse advanced AI tools and methodologies to speed language translation support to the Warfighter on the battlefield. Our services focus on three major areas: Linguistic Data Refinement, Software Engineering, and Linguistics Research. Our Linguistic Services include the following:
Linguistic Data Refinement and Processing
- Collecting text and documenting image data from websites, printed material, handwritten material, and other sources; as well as record data from speakers reading prepared texts and/or speaking assigned roles in scripted scenarios in Arabic, Pashto, Dari, Farsi, Urdu, and other languages
- Editing, translating, aligning, linguistically annotating, organizing, and archiving linguistic data (text, speech or document images)
- Providing quality assurance for multilingual data programs and furnished as Government Furnished Information (GFI)
- Maintaining, updating, and describing linguistic data collected and archived
Software Engineering, Integration, and Test & Evaluation
- Integrating commercial, Government-owned and experimental software components for OCR, MT and/or speech recognition for testbeds and prototypes provided as GFI
- Designing and developing custom interfaces for research and evaluation applications provided as GFI
- Conducting testing and evaluation of commercial, government-owned or experimental multilingual processing components (such as document image processing algorithms, speech recognition algorithms, or automatic text alignment methods), and report on the results
Computational Linguistics Research Support
- Researching and developing hypotheses regarding performance of components and systems that are grounded in current linguistic and computer science theory and research
- Using Perl, C, C++, Java, and SQL to investigate, develop or refine research tools including the exploitation of Arabic morphological analyzers and document image segmenters
- Developing linguistically-motivated test sets based on knowledge of syntax, morphology, phonology, and linguistic theoretical principles Applying test sets in systematic comparison of machine translation and other processing systems
- Developing computational linguistic methods for constructing hybrid machine translation systems based on decomposing and recomposing university and other experimental machine translation components and systems
- Developing hypotheses about performance of hybrid machine translation that are grounded in current linguistic and computer science theory and research