MACHINE TRANSLATION, MACHINE AIDED TRANSLATION, MULTILINGUAL CONTENT MANAGEMENT, AND TRANSLATION TECHNOLOGY(Q3.2).

 In this article we are going to talk about machine translation, machine aided translation, multilingual content management and translation technology terms.
On one hand, Machine Translation (MT) can be defined as a translation where the initiative is with a computer system, either autonomously (FAHQT = Fully Automatic High Quality Translation) or where the user is asked to apply post-editing or pre-editing, or to answer clarification / disambiguation dialogues. Whereas, Machine Aided Translation (MAT) is human translation supported by a computer system. Support is available by lexical data, grammatical help, translation memory, domain information and organizational support.

On the other hand, we can find a Content Management (CM) systems that contain information, mostly in the form of more or less structured text documents, such a system provides mechanisms for storage and retrieval of content data, but it may also give support for indexing of documents, distributed document editing, version management, and generation of different views and guided tours.
In our global society, it is inevitable that content is managed in several languages. In particular, there is often a need to maintain versions in different languages of what is from a content point of view essentially one document. Both the creation and maintenance of such documents is the core of multilingual CM system (MCM system).

The purpose of the present project is to investigate how robust and efficient language technology can be used to enhance the functionality of CM systems. In addition, we want to apply state-of-the-art technology, such as invasive software composition and ontologies, in order to create prototypes of future CM systems and integrate multilingual components into existing systems. Finally, we aim at applying the findings to multilingual learning systems, i.e. MCM systems with learning content, defining a concrete end user focus on our research.

 

Finnaly, the translation technology is changing with the informatical age, and all of that produce a global world where the languages take an important part. All of that, produces a system of translation that every time takes more importance, and it is getting improved with the technological advances

 

http://www.eamt.org/summitVIII/papers/kenny.pdf

http://www.msnbc.msn.com/id/4352578/

http://en.wikipedia.org/wiki/Machine_translation

http://www.foreignword.com/Technology/technology.htm

http://www.hutchinsweb.me.uk/JDoc-1978.pdf

 

 

CHARASTERISTICS OF THE TRANSALATION TASK(Q3.1)

As Femti says the  Characteristics of the translation task refers to the information flow intended for the output, from the point of view of the agent (human or otherwise) who receives the translation. We find the term of translation as the action of interpretation of the meaning of a text, and subsequent production of an equivalent text, also called a translation, that communicates the same message in another language.

1. Assimilation:   The ultimate purpose of the assimilation task is to monitor a large volume of texts produced by people outside the organization, in usually several languages.

  • Document routing or sorting: The purpose of document routing is to scan incoming translated documents quickly in order to send them to the appropriate points for further processing or storage.

  • Information extraction or summarization: The purpose of information extraction or summarization is to extract some portions of the translated text, either manually or automatically, for subsequent processing or storage. Information extraction is typically concerned with filling templates by identifying atomic elements of events. In contrast, summarization aims to provide a self-contained and internally cohesive text which serves as a selective account of the original.

  • Search: The goal of a search process is to identify a set of documents that together can satisfy an information need. Subtasks include refinement of the searcher’s understanding of their need, refinement of the expression of that need as a query, and recognition of relevant documents. Automated components of search systems typically accomplish only portions of the required task, leaving the searcher to assess factors (e.g., veracity and completeness) that would be difficult to detect by automated means. Searchers with limited proficiency in languages in which the document are written will require translation support to accomplish information need refinement, query reformulation, and relevant document recognition.

2. Dissemination: The ultimate purpose of dissemination is to deliver to others a translation of documents produced inside the organization.

  • Internal or in-house dissemination: In the case of internal / in-house dissemination the translations are sent to other people in the same organization, who share aspects of the culture, terminology, and domain knowledge to some extent.The most important feature for this type of task is: speed – how fast is the system, can it keep up with the demand for input.
  1. Routine internal dissemination: The recipients of translation perform a relatively routine task that does not require much variability in the translation service.
  2. Experimental internal dissemination: The recipients of translation perform a rather variable task, and hence may request translations in new domains, genres, or extensions.

  • External dissemination – publication: In the case of external dissemination / export / publication the translations are sent to other people in other organizations, who may not share aspects of the culture, terminology, and domain knowledge.
  1. Single client external dissemination: The recipients of the translation all have essentially the same needs; their translations do not require specific tailoring.
  2. Multi-client external dissemination: Since the recipients of the translation have different needs and capabilities, translation has to be tailored to them.

3.Communication: The ultimate purpose of the communication task is to support multi-turn dialogues between people who speak different languages. The translation quality must be high enough for painless conversation, despite possible syntactically ill-formed input and idiosyncratic word and format usage. The ultimate purpose of dissemination is to deliver to others a translation of documents produced inside the organization.

  • Synchronous communication: In the case of synchronous or interactive communication, the interaction between the participants occurs in real time.
  • Asynchronous communication: In the case of asynchronous or delayed communication the interaction between participants occurs with interruption, for example by email.

http://www.issco.unige.ch:8080/cocoon/femti/printable.html

RESEARCH TOPICS (Q2)

There are many groups that have different projects and aims.
The first research group i am going to talk about is National Centre for Language Technology of Ireland.

It has different research areas and all of them are related with language processing. 

  • CALL Computer Assisted Language Learning:                                                                                                                                                   Integrating CL/NLP/HLT Technology into CALL, CALL for Endangered Languages, CALL for Primary School Environments, CALL for Remedial Learners
  • Corpus Linguistics:                                                                                                                                                                                             Collocation, Contrastive Computational Linguistics, Corpus-based Translation Studies.
  • Machine Translation and Translation Technology:                                                                                                                                                      Statistical and Rule-Based MT (SMT, RBMT), Example-Based MT (EBMT), Translation Memories (TMs), Boosting Existing MT Systems, Machine-Aided Translation (MAT), Computer-Aided Translation (CAT), Controlled Languages
  • Treebank-Based Unification Grammar Acquisition:                                                                                                                                            Automatic Feature-Structure Annotation Algorithms, Subcategorisation Frame Extraction, Wide-Coverage Robust Probabilistic Unification Grammar Acquisition, PCFG-Based LFG Approximation, HPSG Acquisition, Multilingual Treebank-Based Grammar Acquisition
  • Semantics:                                                                                                                                                                                                                 Discourse Representation Theory, Linear-Logic Based Semantics, Computation of Logical Forms from Treebanks, Open-Domain Question Answering Systems
  • Speech Technology:                                                                                                                                                                                                       Speaker Characterisation, Audio Classification, Retrieval and Coding, Human Computer Interfaces (HCIs)
  • Multilingual Information Retrieval/Extraction
  • Language Evolution

On the oder hand, in order to create, coordinate and make language resources and technology available, we can find CLARIN.
This research network has many objectives:

  • integrated: the resource and service centres are connected via Grid technology and form a virtually integrated domain
  • interoperable: the resources and services will be based on Semantic Web technologies to overcome format, structure and terminological differences
  • stable: the resources and services are offered with a high availability
  • persistent: the resources and services are planned to be accessible for many years so that researchers can rely on them
  • accessible: the resources and services are accessible via the web; different access methods and training possibilities are offered tailored to the needs of the communities making use of them
  • extendable: the infrastructure is open so that new resources and services can be added easily

Finnaly, I would like to name an spanish associations whose aims are:                                                                                                                             The establishment of channels of exchange of information and scientific materials, the organization of seminars, symposiums and conferences, the promotion of publications and the collaboration with other national or intercional institutions related to his area of action. 

CENTROS EUROPEOS DE INVESTIGACIÓN HLT (Q1.3)

CMU-PORTUGAL

El Programa de CMU-PORTUGAL sobre la tecnología lingüística implica un consorcio de Centros de investigación portugueses y Universidades además de el Instituto de Tecnología lingüística (LTI) en CMU. EL LTI se formó hace aproximadamente 20 años, primero como un centro de investigación, y luego como un departamento académico en la Escuela de Informática, es el principal centro en tecnologías lingüísticas del mundo. El consorcio en Portugal, que llamaremos el consorcio L2F, incluye el Laboratório de Sistemas de Lingua Falada (L2F) (la lengua al discurso) en INESC-ID, IST, el Centro de Lingüística de Universidades de Lisboa (CLUL), el grupo de lingüística de Universidade do Algarve (UALG), y el Centro para la Tecnología lingüística Humana y la Bioinformática (HULTIG) de la Universidade de Interior Beira (UBI). Además, esperan una cooperación cercana con la red de LINGUATEC establecida en Portugal por FCCN (p. ej., la fundación nacional para el cómputo científico, por  el ” Centro de Recursos Distribuído para o Processamento Computacional da Língua Portuguesa “). La colaboración entre L2F y CLUL remonta a principios de los años noventa, formando la base para una cooperación realmente interdisciplinaria (la ingeniería/lingüística). La cooperación con UALG Y HULTIG es mucho más reciente y, a pesar de su pequeño tamaño en términos de tecnologías lingüísticas humanas, es también muy activa. 

RED NÓRDICA DE TECNOLOGÍA LINGUÍSTICA

La red nórdica de centros de documentación de la tecnología lingüística es una colaboración entre centros de documentación en Dinamarca, Islandia, Noruega, Suecia y Finlandia. 
El objetivo de estos centros es asegurar que el futuro en el campo de tecnología lingüística, esté disponible y reutilizable. Los resultados de investigación serán hechos tan extensamente como sea posible tanto en el nivel nacional como internacional. 
En Dinamarca el Centro recibe la documentación de la investigación de tecnología lingüística (DanDokCenter) para la Tecnología lingüística (CST) que es también el coordinador de NorDoknet.  Los centros de documentación son una de las actitividades financiadas según el programa de investigación de tecnología lingüística nórdica, que es un programa de investigación iniciado hace 5 años por el Consejo nórdico de Ministros. Un comité de programa designado por el consejo de ministros tiene la responsabilidad total del programa que es administrado por la Academia nórdica para el Estudio Avanzado (NordForsk). 

CENTRO NACIONAL DE TECNOLOGÍA LINGÍSTICA DE IRLANDA

La lengua es la modalidad clave en la comunicación. El Centro Nacional de la Tecnología lingüística de Irlanda conduce la investigación en el tratamiento de lengua humana por ordenadores, como el reconocimiento vocal y la síntesis, la traducción automática, interfaces hombre-máquina, recuperación de documentos y extracción, la enseñanza y el estudio de lenguas que usan ordenadores y la localización de software y la globalización. La investigación en la Tecnología lingüística Humana (HLT) es interdisciplinaria e incluye la Lengua natural que Procesa (NLP) y la Lingüística Computacional (CL). HLT tiene implicaciones sustanciales económicas y potencial. El centro lleva hacia fuera la investigación fundamental y desarrolla usos.

 

http://www.cmuportugal.org/ipn/LTIntroduction.aspx    01/04/08 20:30

http://www.nordoknet.org/    01/04/08 21:00

http://www.computing.dcu.ie/research/nclt/contact.html       01/04/08 21:45

http://www.computing.dcu.ie/research/nclt/        01/04/08 21:50

https://www.cs.tcd.ie/Elaine.UiDhonnchadha/irish.htm     01/04/08  21:50

 

HANS USZKOREIT (Q1.2)

                                                                                              

 Biografía:

Hans Uszkoreit es un profesor de Lingüística Computacional en la Univerdad de Saarland. Al mismo tiempo sirve de Director Cientifico en la German Research Center for Artificial Intelligence (DFKI) donde dirige el Laboratorio de Lenguaje Tecnologico. Por cooptación es tambien Profesor del Departamento de Ciencias de la Computación.Uszkoreit es un miembro permanente del Comité Internacional de Linguística computacional, es también miembro de la Academio Europea de las Ciencias, entre sus logros se encuentra su puesto como presidente en la asociación europea de la lógica, lenguaje e información. Es a su vez Miembro de la Junta Ejecutiva de la Estación de Lenguaje Habla, Miembro de la Junta de los Proyectos Europeos de Linguística Léxica, y sirve en cantidad de editoriales internacionales y juntas asesoras.

Sus intereses de investigación son los modelos de ordenador de entendimiento de lengua natural y producción, las aplicaciones avanzadas de la lengua y tecnologías de conocimiento como sistemas de información semánticos, las fundaciones cognoscitivas de la lengua y el conocimiento, formalismos de gramática y su puesta en práctica, sintaxis y semántica de la lengua natural y la gramática alemana.

Sus artículos como trabajos enciclopédicos:

1.  Uszkoreit, H. (1990): 16 Einträge für den Bereich der Unifikationsgrammatiken, In: H. Bußmann (Hrsg.) Lexiko der Sprachwissenschaft. Stuttgart: Kröner. [in der englischen Übersetzung als H. Bußmann, Routledge Dictionary of Language and Linguistics, Routledge, London and New York, 1996]

2. Uszkoreit, H. (1996): Hauptartikel Grammatikmodelle, sowie mehrere Lang-und Kurzartikel zum Themenbereich Grammatiktheorie. In G. Strube (Ed.) Wörterbuch der Kognitionswissenschaft. Klett-Cotta, Stuttgart, 1996.

Sus últimos artículos:

1.Brants, Th., W. Skut & H. Uszkoreit (2003) Syntactic Annotation of a German Newspaper Corpus. In: A. Abeillé (Ed.) Treebanks Building and Using Parsed Corpora, Book Series: TEXT, SPEECH AND LANGUAGE TECHNOLOGY : Volume 20 Kluwer Academic Publisher, Dordrecht

2.Uszkoreit, H. (2004) New Chances for Deep Linguistic Processing. In: Chu-Ren Huang (Eds.) (“Frontiers in Computational Linguistics” in English), Shangwu Press Beijing (also in as Keynote Lecture in: Proceedings of COLING 2002, Taipei).

3. Frank, A., H.-U. Krieger, F. Xu, H. Uszkoreit, B. Crysmann, B. Jörg, U. Schäfer (2007): Question Answering from Structured Knowledge Resources. In: Journal of Applied Logic, Volume 5, Issue 1, March 2007, Pages 20-48.

 

 

 

http://www.coli.uni-saarland.de/~hansu/bio.html 31-03-2008 17:00

http://www.coli.uni-saarland.de/~hansu/hucv_eng.pdf 01-04-2008 18:00

http://www.coli.uni-saarland.de/~hansu/bio.html 01-04-2008 20:00

¿Qué son las tecnologías del lenguaje humano?(Q1)

1. Las tecnologías del lenguaje humano, son una disciplina relativamente nueva que investiga principalmente dos temas. En primer lugar exploran los temas teorícos y prácticos que rodean la habilidad para obtener tecnología especialmente la tecnología de comunicaciones de la información modernas para interactuar con humanos usando capacidades naturales del lenguaje.

Por otro lado, es una disciplina que investiga cómo las tecnologías, especialmente las ICT, pueden servir como útiles adjuntos  a los humanos en el entendimiento del lenguaje, incluyendo análisis, procesamientos, almacenamiento y recuperación.

Esta investigación puede llevar a aplicaciones prácticas, incluyendo el diseño de opciones para la enseñanza on-line .

2. Las tecnologías del lenguaje humano (Human Language Technology) son un término relativamente nuevo que enmarca una amplia gama de áreas de la investigación y desarrollo en el campo de la ingeniería lingüística. El fin de este módulo es familiarizar al estudiante con los ámbitos fundamentales del HLT, incluyendo una gama de usos del proceso de lenguaje natural (Natural Language Processing). El NLP es un término general usado para describir el uso de los ordenadores en la información de un proceso expresado en idiomas naturales (es decir del ser humano). El término NLP se usa en diferentes contextos en este documento y es una de las ramas más importantes de HLTen este documento y es una de las ramas más importantes del HLT. Hay un grupo especial interesado en este proceso, SIG del NLP, dentro de la asociación profesional de EUROCALL. Y un grupo de interés especial en la instrucción de asistencia al lenguaje, ICALI, dentro de la asociación profesional de CALICO. Ambas asociaciones tienen fines similares, la investigación adicional en un número de áreas que se mencionen en este módulo, como:

  • Inteligencia artificial (AI)
  • Linguística computacional
  • Recopilación-Conducida y Recopilación de Lingüística formal
  • Traducción automatizada (ESTERA)
  • Traducción automática (TA)
  • Interfaces de lenguaje natural
  • Proceso de lenguaje natural (NLP)
  • Lingüística teórica

Todos los antedichos son los campos de investigación que han producido los resultados que han probado, están probando y probarán muy útil en el campo del aprendizaje de idiomas asistido por ordenador.

ADVANTAGES AND DISADVANTAGES OF BLOGS

blog_beta_pq.jpg

As wikipedia says a blog (a portmanteau of web log) is a website where entries are commonly displayed in reverse chron ological order. “Blog” can also be used as a verb, meaning to maintain or add content to a blog.

The advantages of blogs from an organizational perspective include the following:

  1. The consumer and citizen are potentially better informed and this can only be good for the long-term health of our societies and economies.

  2. Blogs have potential to help the organization develop stronger relationships and brand loyalty with its customers, as they interact with the ‘human face’ of the organization through blogs.

  3. Blogs, in an intranet environment, can be an excellent way of sharing knowledge within the organization.

  4. Blogs can be a positive way of getting feedback, and keeping your finger on the pulse, as readers react to certain pieces, suggest story ideas, etc.

  5. Blogs can build the profile of the writer, showcasing the organization as having talent and expertise.

The disadvantages of blogs are:

  1. Most people don’t have very much to say that’s interesting, and/or are unable to write down their ideas in a compelling and clear manner.

  2. I have often found that the people who have most time to write have least to say, and the people who have most to say don’t have enough time to write it. Thus, the real expertise within the organization lays hidden, as you get drowned in trivia.

  3. Like practically everything else on the Web, blogs are easy to start and hard to maintain. Writing coherently is one of the most difficult and time-consuming tasks for a human being to undertake. So, far from blogs being a cheap strategy, they are a very expensive one, in that they eat up time. As a result, many blogs are not updated, thus damaging rather than enhancing the reputation of the organization.

  4. Organizations are not democracies. The Web makes many organizations look like disorganizations, with multiple tones and opinions. Contrary to what some might think, the average customer prefers it if the organizat.ion they are about to purchase from is at least somewhat coherent.

http://www.gerrymcgovern.com/nt/2004/nt_2004_08_23_blogging.htm

http://en.wikipedia.org/wiki/Blogg

CHAT

exodus_chat-window.png


Según wikipedia, Chat también conocido como cibercharla, es un anglicismo que usualmente se refiere a una comunicación escrita a través de internet entre dos o más personas que se realiza instantáneamente.

La soledad, es una constante en la vida de todo ser humano, a partir de esta constante es donde juegan un fuerte papel los chats, por muy Freaky que parezca, cada vez son más las personas que lo utilizan. ¿ Por qué esta actitud? en mi humilde opinión se trata del ansia de ser uno mismo el que lleva a las personas a recurrir al chat, ahí con un simple nik o pseudónimo dejas al margen tus problemas de forma real o imaginaria, inventas un personaje sin defectos, sin tus miedos, sin tus inseguridades.

O adoptas la postura contraria, eres tu mismo, con tus rarezas, con aquello que piensas habitualmente pero que eres incapaz de compartirlo con nadie, todo, absolutamente todo está permitido en un chat y muy probablemente no sereis los unicos con esa “rareza” o “peculiaridad”. Es un lazo de unión entre las distintas formas de vivir y sentir, con una diferencia primordial con la realidad, eres lo que quieres ser.

http://es.wikipedia.org/wiki/Chat

Which are the W3C objectives?

 

The social value that contributes the Web, is that this one makes the human communication, the commerce and the opportunities to share knowledge possible. One of the primary objectives of the W3C is to cause independently that these benefits are available for everybody, of the hardware, software, infrastructure of network, language, culture, geographic location, or physical or mental ability.
 The W3C continues expanding their influence through initiatives that directly support the diffusion of the technologies Web and its benefits in the countries developing. The work made in areas like accessibility Web, internationalization, device independence, and movable Web it is specially important since the W3C works to reach a Web after all. At the same time, through the Offices of the W3C and, also, through other efforts to increase the participation, the W3C is committed to create an accessible Web for more people in the world.
The document on World-wide Participation in the Partnership World Wide Web summarizes the efforts that have been carried out to increase the world-wide participation in the work of the W3C, and to guarantee that the results obtained in the Partnership benefit a still greater community.

 

 http://www.w3c.es/Consorcio/mision

What is HTML

HyperText Markup Language (HTML) is a language to specify the structure of documents for retrieval across the Internet using browser programs of the WorldWideWeb.

HTML is an application of the Standard Generalized Markup Language (SGML) which is the International Standard for text markup. The principle is that text markup concentrates on structure rather than appearance, making the files more reuseable and leaving the visual details to the end-user software (like the browser you’re reading this with now). For the reasons why, see Eliot Kimber’s comments.

Details of the specification are in the IETF Draft and the HTML Document Type Description. There is a FAQ (Frequently-Asked Questions) document, and a new book on HTML and the WorldWideWeb out shortly.

http://www.ucc.ie/info/net/whatis.html

Entradas más antiguas »