Notes.ini Entry



Name:

    KDS_UseInxight3

Syntax

    KDS_UseInxight3=0 / 1

Applies to:

    Servers - Discovery

Add-on:


    First Release:


      Obsolete since:


        Category:

          Operating System, K-Station

        Default:

          None

        UI equivalent:

          None

        Description:
        Summarizer is determining incorrect languages

        Discovery Server is delivered with Inxight version 2 enabled by default in the English versions. Inxight version 3 is the default for all other language versions.
        Inxight 2 Summarizer is determining the wrong language for some XML tags defined by the Web spider and is not capable of guessing the correct language in certain cases.

        Workaround: Switch to Inxight 3 by changing the Notes.ini setting from KDS_USEINXIGHT3=0 to KDS_USEINXIGHT3=1.
        Inxight support for languages is as follows:

      • Version 2 - Danish, Dutch, English, Finnish, French, German, Italian, Norwegian(Bokmal), Norwegian(Nynorsk), Portuguese, Spanish, and Swedish
      • Version 3 - Chinese(Simplified), Chinese(Traditional), Danish, Dutch, English, Finnish, French, German, Italian, Japanese, Korean, Norwegian(Bokmal), Norwegian(Nynorsk), Portuguese, Spanish, and Swedish

        In addition:
      • Performance: Inxight 2 runs faster.
      • Multi-word tokens (such as "International Business Machines", "to and fro", and so on): Inxight 2 doesn't support them, while Inxight 3 does. Using Inxight 2, you get "international", "business", "machine", "to", "and", and "fro" all as separate tokens. Using Inxight 3, you get individual multi-word tokens such as "International Business Machines" and "to and fro."
      • Inxight 3 performs word normalization, such as capitalization and stemming, better than Inxight 2.

        InXight 3.X stemming some Spanish words incorrectly
        Especially if a document title is all caps, but if titles are not detected, capitalized words will be recognized as proper nouns and won't be case-normalized or stemmed properly by InXight.

        When OS locale is set to Spanish, time formatting changes to a.m. (with periods)
        Times formatted with periods in the Schedule fields of a data repository definition form are not parsed correctly, resulting in an error when you try to save the definition form.

        Inxight 3 memory/performance problem results in slow spidering times
        Upgrading from Inxight 2 to Inxight 3 could significantly affect spider performance.
        Workaround: Run the Discovery spiders on a secondary Discovery Server dedicated to spidering, and narrow the list of languages in std.langid-config, as described in the release note, 'Too Many Languages Listed in std.langid-config File' below.

        Different versions of the same word are case-normalized differently by InXight
        Words such as "Monday", "monday", "MONDAY", "MoNdAy", "January" and "january" are not being type normalized consistently, which means that each word is considered a separate token by the K-map Building service. All the "Monday" token versions should be collapsed into a single version (say "Monday") and the same with all the "January" versions. With the current case-normalization behavior ATG could potentially build clusters around each of these different versions, resulting in different categories for different versions of the same token.

        Workaround: All variations of the same word need to be specified in the stopword list in order to be filtered out. To open the language-appropriate stopword list:
        1. Locate the file, stopwords_<language suffix>.txt, in the Discovery Server's data directory, where <language suffix> could be en, es, it, ko, or so forth.
        2. Make a backup copy of the file before opening it in a text editor such as Notepad.
        3. Add the words to be ignored to the appropriate alphabetical section.
        4. Be sure to Save the updated file.

        Using Inxight 3, long abstract summary text might be truncated mid-word
        This occurs for languages where white space is a valid word boundary. Inxight 2 has better support for Western languages in which white spaces are a reliable word boundary.

        Inxight 2 might produce labels in a Spanish K-map that are verbs instead of nouns
        Workaround: Upgrade to Inxight 3. Upgrading to Inxight 3 would help reduce the frequency of verbs showing up in the K-map, but wouldn't eliminate them entirely.

        Problems with font and text formation in abstract summaries when using Inxight 3 with English repositories