[Please note that some of the entries are very old.*]

Friday 13 March 2015

TMX Files, an Approach

All TMX files are standard translation memories, so they are equal. But if you treat them equally, it’s unlikely you’ll benefit the most of them.

For this approach, I’ll assume you use Auto-Assemble (AA). This may not be the best approach in all situations, nor for all language pairs. However, I think AA is very useful for most situations, in fact I consider it to be the core feature of a decent CAT tool.
AA comes up with suggestions based on the CAT tool's algorithms and your (priority) settings of the connected memories for segments (TM), and memories for words and phrases (termbase). If you get the wrong "hits," you can change those settings, and/or add the correct term in the TM/TB with the highest priority, so next time, it'll show up correctly.
In Menu | Edit | Options | Auto-Assembling, you can select if you want to use the Auto-Assembling Panel (a pop-up panel), or the Automatic Insertion of Matches. I prefer the latter, however, I can imagine automatic insertion of those results can be counterproductive, especially if the word order of the target language differs from the one in the source language. That doesn't make the actual results less useful, though. Besides, when you arrive at a new segment, the AA results have been selected, so you can delete them with your very first keystroke if you don’t like the result. Memorise the results you do like, though.

You can use TMX files for memories for segments (TM), for memories for words and phrases (termbase), or for both. Whenever you create or add a new memory, you’ll have to indicate how you want to use it in the New Memory dialogue. It’s the first entry, under Memory Type. Or you can select it in the Dashboard, using the gear wheel at the bottom. Beware, however, that in the Dashboard, you cannot use different settings for your various TMX files. For the Dashboard, all TMX files are equal… In my approach, using a TMX file for both segments and words and phrases isn’t very useful.

TMs
For me, different TMs require different settings. It is of course possible to use only one TM, but it’s far more likely you’ll end up having several. I distinguish between:
  • ProjectTM. A TM for segments. It’s the TM CT wants you to (de)select first in the Dashboard. I strongly suggest you select it, also because it can play a huge role in Auto-Completion. Since it’s usually not a big file - though it’ll “grow” during the project - you can use it to automatically save your work very frequently (I set Autosave Project to after two segments in Options | Workflow, all other TMs I set to 5). It allows you to save tags, and this is the only TM for which I think this is useful. It’s also very suitable to check consistency within the project (see QA). The latter means that you should set the ProjectTM to Keep Newer Duplicates when you create or open it. And since this is your current job, you should assign the highest priority to the ProjectTM. Settings: In short, in the New Memory dialogue, you should select (from top to bottom): [Memory Type] Translation Memory, Processing Tags, Terms Consistency Check, [Options] High Priority, Automatic, Fuzzy and Hits, Keep Newer Duplicates.
  • Any memories for segments provided by the client. Ideally, they are very important, and should be used for high-priority hits and consistency check. You don’t want to “pollute” them with your own translation, so they should be set to Read-Only. The settings: [Memory Type] Translation Memory, Terms Consistency Check, Read-Only, [Options] High Priority, Automatic, Fuzzy and Hits. Since they are Read-Only, you don’t have to worry about the duplicates. You may have to review those settings, as som client provided TMs are pure faeces.
  • A general memory for segments (Big Mama). This is the TM in which you keep all your translations for the language pair. It’s optional of course, but results from your Big Mama may surprise you positively. It may become too big to use for AA, so you may have to set it to Manual workflow integration. Exclude the Big Mama from consistency checks when doing the QA at the end of the project is of the essence. The settings: [Memory Type] Translation Memory, optional: Pretranslate Only, [Options] Low Priority, Manual, Fuzzy (Fuzzy and Hits will take much longer to assemble, this goes for all TMs, of course), Keep All Duplicates.
  • Huge third-party subject specific memories for segments, like the DGT for EU jobs. The settings: [Memory Type] Translation Memory, optional: Pretranslate Only, [Options] Medium Priority, Manual or Pretranslate, Fuzzy (Fuzzy and Hits will take much longer to assemble, this goes for all TMs, of course).
  • Other subject specific memories for segments: You’ll have to decide the settings based on the situation. Since I use a Big Mama, I don’t have much experience with them. If those TMs are from other sources than your client or your own jobs, be very careful.
  • Memories for terms play a major role in my approach. I don’t use a project specific termbase, because the project terms will show up in my high-priority project specific TM anyway. However, I do use a
  • Big Papa, the equivalent of the Big Mama for words and phrases. Add to it as many general words and phrases as you can, it will pay you back generously. The settings: [Memory Type] Termbase, [Options] Low Priority, Automatic (unless it gets too big, which is less likely than in the case of your Big Mama), Fuzzy, Keep All Duplicates.
  • Any memories for terms provided by the client. See memories for segments provided by the client. The settings: [Memory Type] Termbase, Terms Consistency Check, Read-Only, [Options] High Priority, Automatic, Fuzzy. Since they are Read-Only, you don’t have to worry about the duplicates. You may have to review those settings, as some client provided TMs are pure faeces.
  • Subject specific memories for terms (rather than a client specific ones). You may want to use more than one. They are used next to the Big Papa, and should overrule it. Arguably the most important TMX memories I can think of. The settings: [Memory Type] Termbase, [Options] High Priority, Automatic (unless it gets too big), Fuzzy, Keep Newer Duplicates.
  • Huge third-party subject specific memories for words an phrases, like the IATE for EU jobs. The settings: [Memory Type] Termbase, optional: Pretranslate Only, [Options] Medium Priority, Manual or Pretranslate, Fuzzy.
UPDATE: Since the introduction of Total Recall, the "pretranslate" function seems to have become redundant. Unless the TM, resulting trom Total Recall, is too large to be processed the regular way.