8.2.1 - eXercise

Use the sample collective bargaining agreement file (the English one) and try to extract phrases and vocabulary that might be useful for translation assignment of a similar document. Make use of the cross-platform software tool ExtPhrJ (extrphrj.jar) that was developed by Tim Craven, Professor Emeritus of the University of Western Ontario. Follow the procedure outlined below and extract at least 30 most prominent phrases/collocations:
1. Open the application, click File -> Extract from (Figure 1), browse to a location of the relevant file and press Open button.
2. Go to Tim Craven's website and download a list of stop words (stoplist.txt).
3. Load the stoplist into the tool by choosing (Figure 3) Options -> Stoplist. Note that the output file, you loaded in step 1 becomes considerably reduced.
4. To reduce the file further, make sure the Option -> Show full phrases is ticked. You may also activate Edit -> Collapse (Figure 2) option; that way you would get rid of the shorter phrases that are already included in longer ones.
![]()
5. Depending on the text length it might also be necessary to adjust the minimal occurrence number through Options -> Minimum occurrences.
6. Once you feel satisfied with the output, save the list as a simple *.txt file (possible follow-up activities could include transforming the list first into an Excel file and later perhaps into a MultiTerm termbase).
Now repeat the process, only this time work with the Czech text and Czech stoplist!