FEISGILTT 2014 Accepted Papers with Abstracts

TrackAuthorsTitleType AbstractKeywordsTopics
XLIFF Chase Tingley XLIFF 2.0: Will the Standard Succeed? Key Note This presentation is a follow-up to a 10-minute presentation I gave at the 2013 TAUS User Conference, titled "Why Localization Standards Fail". That presentation briefly examined some reasons why past localization standards efforts have encountered trouble with widespread adoption. It also suggested possible avenues for improving the successful adoption of future localization standards efforts. This presentation would expand on some of the same themes, with a specific focus on XLIFF 2.0. The goal is to start a discussion of XLIFF 2.0 in the context of known types of standards failure, to help predict what issues it may encounter. This includes both an examination of the structure of the format itself and the ways it improves on XLIFF 1.2, as well as looking at what market factors may hinder or or help drive its adoption.  * An overview of the types of failures we have encountered in previous localization standards efforts. * A discussion of the ways in which XLIFF 1.2 was and was not successful in the context of these types of failures. * An analysis of the structure of XLIFF 2.0 and how the changes since 1.2 reflect previous implementation issues, and how they will (or won't) address problems in adoption of the new standard. * An examination of the tools market and what obstacles to widespread adoption of XLIFF 2.0 may exist, along with efforts that can be taken by the TC or other parties to address them.  standardization, interoperability, adoption Feedback to XLIFF 2.0 core and module features, Who is XLIFF for?
XLIFF Bryan Schnabel > Demonstrating a Practical Application of Each XLIFF 2.0 Module Featured In this presentation each XLIFF 2.0 Module will be explained, and then demonstrated in a real life application. XLIFF 2.0, Modules, Applications Experimental implementations of XLIFF 2.0
XLIFF Bryan Schnabel > XLIFF 2.0 and its Applicability to Content Management Systems Featured XLIFF 1.2 was incorporated into some well known Content Management Systems, and has been shown to greatly enhance their translation workflow. But some of XLIFF 1.2's limitations and lack of conformance guidelines tended to leave CMS implementers too much room to invent proprietary ways of dealing with their needs. XLIFF 2.0 has improved core features, several new optional modules, and a robust set of processing instructions and constraints. This will make XLIFF 2.0 even more attractive to CMS implementers. This presentation will show each XLIFF 2.0 improvement through the lenses of how it can/will help CMS implementers better support the translation/localization workflow. XLIFF 2.0, CMS, DITA, Application Applied XLIFF: Profiling and Subsetting of XLIFF 2.0 and still 1.x
CAL David Lewis and Leroy Finn > A Linked-Data Vocabulary for the Linport Localsiation Project Characteristics Featured The Language Interoperability Portfolio Project ( has worked to define an interoperable format for exchanging translation project characteristics between tools within the localisation value chain. It provides interoperable names and definitions for many important linguist characteristics of source and target content as well as characteristics of the expected tasks to be performed in the translation process, of the environment within which it should be performed and of the business context in which it should be conducted. An XML format allows textual descriptions of these characteristics to be exchanged alongside the content to be translated and other resources. However, many of the characteristics addressed are currently not subject to widespread consensus in the industry in terms of how they might be encoded. Therefore while the Linport model enables the exchange of human readable specifications of these project characteristics, specific project-by-project, or client-by-client agreements are required to specify machine readable encodings of these characteristics. For LSPs dealing with many projects and clients, this represents a barrier to integrating Linport project characteristic more systematically into their project management systems, where they can then be used directly in downstream processes and tools. In this paper we present a mapping of the current Linport model into a linked data vocabulary using the W3C’s standard Resource Description Framework (RDF) and Web Ontology Language formats. As the underlying RDF model can be rendered as either XML, text (using the Turtle standard) or JSON, the exchange of specification in this vocabulary can be easily exchanged between a wide range of localisation tools. Through features of name spaces, multiple inheritance of class definitions and property based restrictions on class definitions, these languages provide a controlled mechanism for extending the Linport model. Moreover, since these definitions are readily published on the web and well supported by authoring and query tools, any specialisation of a characteristic can be published and reused between projects. This can encourage both consistency between projects within a client or provider. This also provides a decentralised mechanism for achieving consensus over different characteristics, in different fora, over time. We also describe in this paper the potential this approach has for supporting automated checking between the intended characteristics of the project and the RDF encoding of the provenance of the actual workflow execution. Acknowledgements: This work is partially funded by the European Commission as part of the FALCON project (grant number 610879) and partially by the Science Foundation Ireland (Grant 12/CE/I2267) as part of CNGL ( at TCD. Linport, Linked Data, RDF Applications of Linked Open Data in i18n and l10n, Building bridges between localization and linked data, The role of standardized formats
CAL Yves Savourel ITS 2.0 in XLIFF 2. Public Progress Report + Discussion In continuation of the mapping offered between ITS 2.0 and XLIFF 1.2, the ITS Interest Group will be working on a mapping to represent the ITS 2.0 data categories in the new XLIFF 2 format. One of the new aspects of XLIFF 2 is its modularity, opening an opportunity for a specific ITS module integrated with the XLIFF standard itself. The presentation will focuses on the development of such module, using a concrete implementation based on the open-source Okapi XLIFF Toolkit. its 2.0, xliff 2.0, interoperability, content annotations, localization metadata Building bridges between localization and linked data, Open source reference implementations for i18n and l10n standards, The role of standardized formats
XLIFF Yves Savourel The Differences Between XLIFF 1.2 to XLIFF 2.0 Featured This presentation looks at the main differences between the 1.2 and 2.0 version of XLIFF. Its goal is to help an implementer of 1.2 to understand better what has changed and how to map or refactor the 1.2 data into the new 2.0 format. xliff 1.2, xliff 2.0, interoperability Experimental implementations of XLIFF 2.0
CAL Chris Hokamp > Leveraging NLP Technologies and Linked Open Data to Create Better CAT Tools short presentation We present a prototype of a web-based tool for Computer Aided Translation (CAT). By making use of NLP technologies such as Named Entity Recognition, Syntactic Parsing, Chunking, Part-Of-Speech tagging, and vector-space methods for semantic similarity, we are able to provide end users with rich feedback about the linguistic content of both source and target segments. Translators are able to interact with linguistically coherent chunks of content, creating an intuitive and efficient post-editing workflow. By leveraging Linked Open Data resources like DBPedia, we are able to provide relevant supporting information beyond the content of the document being translated. Post-editing, Machine Translation, CAT, Linked Open Data Building bridges between localization and linked data, Creating, organizing, sharing and storing content and data, Discovering and extracting multilingual information, Integrating web-based CAT into CMS platforms, Linked data and ITS 2.0 use cases, Use cases for content analytics in localization workflows
XLIFF Dave Lewis, Rob Brennan, Alan Meehan and Declan O'Sullivan Using Semantic Mapping to Manage Heterogeneity in XLIFF Interoperability short presentation The XLIFF 1.2 standard features several extension points that have served to complicate the full interoperability of translation content meta-data between tools from different vendors. Many vendors’ specific extensions are in common use. XLIFF profiles promoted by individual large tool vendors or by consortia of smaller vendors (e.g. Interoperability Now!) attempt to reduce this complexity. However, as no one profile dominates, the overall result is that many XLIFF profiles are now in common use that extend the original standard in different ways. The XLIFF 2.0 standard attempts to control the evolution of extensions through the managed definition of new modules. However, until XLIFF 2.0 fully supplants the use of XLIFF 1.2 and its variants, tools vendors and language service providers will have to handle a range of different XLIFF formats and manage heterogeneity in the use of meta-data that impairs its use in automating steps in the localisation workflow. Managing the mapping of different XLIFF profiles to an internal representation requires therefore, either extensive coding knowledge, or the understanding and maintenance of a wide range of different XSL Transforms. In this work we describe an alternative approach to handling the design, implementation and maintenance of meta-data mappings using semantic web technologies. In previous work [lewis12] we described how a static XSLT mapping from XLIFF files at different points in the localisation workflow can be used to build up an interlinked provenance model of the process in the Resource Description Framework (RDF) schema standard of the semantic web. This then allows standardised query end points to be used to conduct fine grained workflow monitoring and process analytics as well as actively curating translated text for retraining of machine translation engines used in the workflow. Here we explain how this approach can be expanded with techniques that define meta-data mappings in an explicit and therefore easily maintainable manner. With our approach, in order to support mapping publication, discovery and meta-data annotation, we represent the mappings as SPARQL Inference Notation (SPIN) Linked Data [Knublauch]. SPIN is an RDF-based vocabulary that are used to represent business rules and constraints via SPARQL queries. In our approach SPARQL is used to represent the executable mappings, and the CNGL GLOBIC model [brennan13] is used for the meta-data annotation. This approach enables the development of a common portfolio of mappings that can be combined in different ways to address the range of XLIFF mapping issues currently manifest in the marketplace and to enable runtime analytics of mapping usage.  Semantic Mapping, Interoperability, Multilingual Web, XLIFF, RDF Interoperability with other standards, XLIFF and localization process management
CAL John Moran > User Activity Data in a CAT tool - An industrial use case and a proposal for an open standard short presentation Computer-Aided Translation (CAT) tool instrumentation is a technique used to gather User Activity Data (UAD) from working translators. It is an idea borrowed from TransLog, an application used in the academic field of Translation Process Research. Rather than just recording the final translated sentence, the idea is to produce a log describing the mechanical process of how a translation was produced, ideally in a manner that facilitates later replay at the segmental and supra-segmental level. In this presentation we will present a case study to show how CAT-UAD is used to measure the utility of machine translation in terms of its impact on translation speed to inform word price discounts for post-editing at Welocalize, a CNGL industrial partner and to steer MT engine development towards factors that slow the post-editing process. We will discuss how it can be used to assess MT utility for various content types using intermittent productivity tests or, with the permission of translators, for ongoing monitoring of post-editing speed. We will discuss how MT is just one of a number of technologies that can have a significant impact on translator productivity for various content types in terms of words per hour and why A/B testing of language technology using CAT_UAD is important if LSPs are to respect per hour earnings of the freelance translators who work in the localization industry. Finally, we will present the XML format we use for CAT-UAD in iOmegaT, an adaptation of the well-known open-source CAT tool iOmegaT and make a case for why such data should be become a new open standard for implementation in proprietary CAT tools. CAT-UAD, iOmegaT, OmegaT, MT post-editing The role of standardized formats
XLIFF Lucia Morado Vázquez and Silvia Rodríguez Vázquez Teaching XLIFF to translators and localisers short presentation The XML Localisation Interchange File Format (XLIFF) is the main standard for the interchange of localisation data during the localisation process and the most popular and used Computer Assisted Translation (CAT) Tools already support its current version 1.2. However, the potential final users of the format, i.e. is translators, still have limited or no knowledge about the standard and the possible advantages of its adoption (Anastasiou 2010). With a view to breaching this knowledge gap, we have been introducing XLIFF as a topic of study in the translation and localisation studies curricula for the last four years in four different European universities, both at undergraduate and postgraduate levels, thus satisfying one of the missions of the Promotion and Liaison OASIS XLIFF subcommittee. In this paper, we aim at sharing our experience in teaching XLIFF to translation and localisation students: the curricula design, the topics covered, the labs undertaken and the areas that we improved and modified based on our experience over this period of time. We always ground the design of our XLIFF module on two main axes: the previous technical background of the students, and the level of specialised knowledge that we want them to acquire. On the first axis, we gather information about the students’ experience in technical aspects of text formats, mainly of markup languages. On the first iterations of the XLIFF module, we realised that most of the problems that students faced were not directly related to XLIFF itself but to the lack of knowledge on XML basic concepts. Therefore, we opted to tackle that constraint by adding extra tutorials and labs on XML prior to the introduction of XLIFF. The second axis is determined by the maturity of the students and the grade of specialisation of their degree. We take these factors into account because students pursuing an undergraduate diploma in translation might not need or are not prepare to assimilate in-depth technical concepts, while this might be not be the case for postgraduate students in localisation or translation technologies Master degrees. The main topics covered in our XLIFF modules are the following: history and development of XLIFF, extraction-merge paradigm, benefits, relevance for the translator and localiser, support in CAT tools, and main structural elements and attributes. We combine lectures with hands-on sessions and practical labs. In some of them we try to recreate real life situations presenting students with corrupted XLIFF files that they need to test, analyse, and eventually fix. After each module we collect students’ feedback, which has also helped us to shape and redesign our contents. The module always has a positive acceptance and students emphasize the idea that it helps them to better understand the internal working mechanisms of CAT tools, as well as to lose their reluctance towards inspecting the code and modifying it. All the aforementioned topics will be presented both from our didactic point of view, and from the perspective of our students learning experience. XLIFF, Teaching, Translation studies, Localisation, Who is XLIFF for?
CAL Renat Bikmatov, Leonid Glazychev and Serge Gladkoff Previewing ITS 2.0 metadata directly in a web browser short presentation The demo presents a universal JavaScript-based content parser and preview tool that allows to easily view localization metadata embedded in the content. The tool fully supports the new Internationalization Tag Set (ITS) 2.0 standard for metadata introduced by W3C. One can directly preview metadata in both HTML and XML files using a web browser. Logrus is currently looking at extending the tool to support ITS 2.0 for XLIFF 2.0. No file format conversion is required. The tool represents further development of the Open Source Metadata Preview Tool codenamed "WICS", which was developed in 2013. A number of sample files to illustrate the usage of various types of ITS 2.0 metadata has been created, and all these features will be demonstrated live. ITS, ITS 2.0, Internationalization Tag Set, Metadata Preview, JavaScript, XLIFF, HTML, HTML 5.0, Localization, Localisation Open source reference implementations for i18n and l10n standards, The role of standardized formats
XLIFF Yves Savourel The Okapi XLIFF Toolkit short presentation XLIFF 2.0 comes with a large set of constraints and processing requirements aiming at making the format more interoperable than its predecessors. While writing and reading XLIFF 2.0 documents is easy, making sure all requirements are met may be more difficult to achieve. The Okapi XLIFF Toolkit is a Java open-source library that make this a lot easier. It offers various components to validate, read, write and manipulate XLIFF content without much headache. The library also offers simple ways to manipulate valid custom extensions for example to attach extra metadata to the extracted content. The presentation will explore the toolkit and demonstrate how to use it with concrete sample applications.  xliff 2.0, localization, content extraction Open source reference implementations for i18n and l10n standards
CAL Yves Savourel QuEst Integration in Okapi short presentation The QuEst project provides a way estimate the quality of machine translation candidates, while the Okapi Framework offers various ways to prepare documents for translation, including pre-translating the text with various machine translation engines. Integrating QuEst into the Okapi workflow is an ideal scenario that demonstrates how to apply a research project to a production environment. The project, funded by the European Association for Machine Translation, is ongoing and set to be completed at the end of the summer. The presentation shows the current state of the work and how standards such as ITS, XLIFF and TMX are key to perform the integration. MT Confidence, Quality Estimation, ITS, XLIFF, TMX Interoperable MT and text analytic confidence scores, Open source reference implementations for i18n and l10n standards, The role of standardized formats
XLIFF David Filip and Bryan Schnabel > XLIFF 2.1 Roadmapping Info Session Preconference Session This is intended as OASIS XLIFF TC public info-session. The agenda points are: 1) Status of the ITS 2.0 mapping/module. Reporter: Yves Savourel 2) Plans for advanced validation support in 2.1. Reporter: Felix Sasaki 3) Merging 2.1 with the media type registration template. Reporter: David Filip 4) Errata & other quck wins for 2.1. Reporter: Bryan Schnabel XLIFF, roadmap, 2.1, validation, NVDL, Schematron, ITS, OASIS, W3C Feedback to XLIFF 2.0 core and module features, Interoperability with other standards, ITS 2.0 as an XLIFF 2.0 extension -> ITS 2.0 as a XLIFF 2.1 module, Proposals of XLIFF 2.x modules
XLIFF David Filip and Bryan Schnabel > XLIFF 2.x Roadmapping Info Session Preconference Session This is intended as an OASIS XLIFF TC Info session. The agenda: 1) Feature proposals for XLIFF 2.x releases. Moderator/facilitator: Bryan Schnabel; Possible topics: Integration of Linked Open Data, TBX, TMX, RDF etc. 2) QA features focus: Reporter: An invited external reporter TBC. That reporter will have worked closely with the TC in advance in order to map out a QA oriented XLIFF 2.0, 2.1 and 2.x profile. 3) Roadmap for 2.2, 2.3, and beyond. Moderator/facilitator: Kevin O'Donnell, who will bring a strawman release schedule. XLIFF, 2.x, OASIS, W3C, QA, linked data, linked open data, big data, language resources, ETSI, ISO, requirements gathering Applied XLIFF: Profiling and Subsetting of XLIFF 2.0 and still 1.x, Feedback to XLIFF 2.0 core and module features, Interoperability with other standards, Proposals of XLIFF 2.x modules, Who is XLIFF for?, XLIFF and localization process management, XLIFF and management of memory and terminology matches, XLIFF beyond description: Creative or Unexpected XLIFF Use Cases
XLIFF Felix Sasaki > Plans for advanced validation support in XLIFF 2.1 Public Progress Report + Discussion Advanced validation support is one of two main features considered for XLIFF 2.1. There are advanced static validity and processing requirements in XLIFF 2.x and Schema 1.0 is simply not expressive enough to provide automated tests validation, NVDL, Schematron, OASIS, W3C Feedback to XLIFF 2.0 core and module features, Interoperability with other standards, Proposals of XLIFF 2.x modules
XLIFF David Filip > Merging XLIFF 2.1 with the media type registration template Public Progress Report + Discussion XLIFF 2.1 should regsiter a media type with IANA media type, MIME, registration, IETF, IANA, XLIFF, .xlf Feedback to XLIFF 2.0 core and module features, Interoperability with other standards, Proposals of XLIFF 2.x modules
XLIFF Bryan Schnabel > Errata and other quick wins for XLIFF 2.1 Public Progress Report + Discussion Collecting errata and quck wins in a public discussion errata, XLIFF Feedback to XLIFF 2.0 core and module features, Interoperability with other standards, Proposals of XLIFF 2.x modules
XLIFF Bryan Schnabel > Feature proposals for XLIFF 2.x releases Public Requirements Gathering Collecting and discussing feature proposals for future XLIFF 2.x releases requirements gathering, features, XLIFF Feedback to XLIFF 2.0 core and module features, Interoperability with other standards, Proposals of XLIFF 2.x modules
XLIFF TBD > QA features focus for XLIFF 2.x Public Requirements Gathering What QA features are supported in XLIFF 2.0, what will be available through 2.1 and what are the gaps that will need to be covered in 2.x releases? QA, Quality, Quality Assurance, requirements gathering, features, XLIFF Feedback to XLIFF 2.0 core and module features, Interoperability with other standards, Proposals of XLIFF 2.x modules
XLIFF Kevin O'Donnell Roadmap for XLIFF 2.2, 2.3, and beyond Public Progress Report + Discussion What should be the roadmap for modular XLIFF 2.x updates? roadmap, release schedule, XLIFF Feedback to XLIFF 2.0 core and module features, Interoperability with other standards, Proposals of XLIFF 2.x modules
CAL Phil Richie > Translation Quality and RDF position statement + discussion     Content Analytics use cases
CAL Yves Savourel > Linked Data in Translation Kits position statement + discussion As localization workflows become more sophisticated and the availability of Web services related to content analysis grows, it is now possible to use a wealth of linked data during the localization/translation process. But such usage faces several challenges in both how to apply the metadata to the content and how to make them accessible in a useful way to the end user. This presentation will go through examples of how the Okapi Framework, an open-source set of component for localization, tries to make use of linked data through standards such as ITS 2.0 or XLIFF. It will show some of concrete issues related to annotations and user accessibility, as well as explore some of the benefits such metadata can bring.   Content Analytics use cases
CAL David Lewis > Turning Localisation Workflow Data to Linked Data position statement + discussion Localisation workflows are able to benefit from standard tool exchange formats such as XLIFF for text being translated, TMX for translation memories and TBX for term bases, as well as TIPP and Linport for packaging such formats with project level meta-data. Increasingly, however, localisation actors are concerned with managing linguistic assets across language pairs for tasks such a terminology quality control, SEO tasks, assembling corpora for custom MT engines. Linked data technology may play an important role in improve such asset management tasks, especially when the asset span several organisations. This however requires well defined mappings so that XML-based interchange formats can be reliable synced with asset management and sharing through linked data.   Content Analytics use cases
CAL Alan Melby > Linport and RDF position statement + discussion     Content Analytics use cases
CAL Andrejs Vasiļjevs > Terminology resources in the ecosystem of linked data services for language professionals position statement + discussion We will show how recent advances in the field of terminology automation are bringing together data and services needed for language professionals. Applying state of the art methods in natural language processing, exploiting the benefits of cloud-based technologies, standardizing APIs and formats for data exchange are providing a new level of interoperability and automation of traditional processes.   Content Analytics use cases
CAL Ioannis Iakovidis > The Localisation Web position statement + discussion     Content Analytics use cases
CAL Víctor Rodríguez Doncel > Towards high quality, industry-ready Linguistic Linked Licensed Data position statement + discussion The application of Linked Data technology to the publication of linguistic data promises to facilitate interoperability of such resources and has lead to the emergence of the so called Linguistic Linked Data Cloud (LLD) in which linguistic data is published following the Linked Data principles. Three essential issues need to be addressed for such data to be easily exploitable by language technologies: i) appropriate machine-readable licensing information is needed for each dataset, ii) minimum quality standards for Linguistic Linked Data need to be defined, and iii) appropriate vocabularies for publishing Linguistic Linked Data resources are needed. We propose the notion ofLicensed Linguistic Linked Data(3LD) in which different licensing models might co-exist, from totally open to more restrictive licenses through to completely closed datasets.   Content Analytics use cases
CAL David Lewis & Felix Sasaki > Requirements Gathering: scenarios for linked data in localization Public Requirements Gathering     Content Analytics use cases