Home » Server Options » Text & interMedia » Domain index >stemming concept
Domain index >stemming concept [message #599106] Tue, 22 October 2013 04:36 Go to next message
rameshaimhigh@gmail.com
Messages: 10
Registered: October 2013
Junior Member
Hi all,

I am using CONTAINS with stem($) operator in it,in SQL query. Will stem approach support non-english keyword?

if stem will not support non-english keyword then is there any way to rectify this.

please advice on this

English Keyword SQL Query:
--------------------------
select itemnum, score(1), description from maximo.item
where (( contains(description,' $bearing ', 1) > 0 )) and itemnum = '1357851011'

Non-English Keyword SQL Query:
--------------------------
select itemnum, score(1), description from maximo.item
where (( contains(description,' $素晴らしい教育水準 ', 1) > 0 )) and itemnum = '1357851011'

Waiting for your reply
Ramesh

[Updated on: Tue, 22 October 2013 04:43]

Report message to a moderator

Re: Domain index >stemming concept [message #599117 is a reply to message #599106] Tue, 22 October 2013 05:22 Go to previous messageGo to next message
sreenivasulu468
Messages: 4
Registered: October 2013
Location: Bangalore
Junior Member
I think it supports Have u tried above query? Does it cause any error.if yes please specify error message?
Re: Domain index >stemming concept [message #599131 is a reply to message #599117] Tue, 22 October 2013 06:44 Go to previous messageGo to next message
rameshaimhigh@gmail.com
Messages: 10
Registered: October 2013
Junior Member
No it is not returning any error. It will give the result for non-engish keword with exact match.

Say for example for stemming apporach, if i search engineer then it will check internally with engineer and engineering too and give the results.

But for non-english keyword if i search engineer with Chinese language then it will give the result of exact matching. But i need the result for chinese keyword of engineering too.

this is my problem. is there any solution? please welcome!!!!!!!!!!

[Updated on: Tue, 22 October 2013 06:46]

Report message to a moderator

Re: Domain index >stemming concept [message #599181 is a reply to message #599131] Tue, 22 October 2013 16:21 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
According to the documentation, stemming is available through either the stemmer attribute of a wordlist or the index_stems attribute of a lexer. I have never seen stemming work with the index_stems attribute of a lexer, but the stemmer attribute of a wordlist does work. The two are in conflict. If you specify the index_stems attribute of a lexer, then it uses that instead of the stemmer attribute of the wordlist, so it is important not to use the index_stems attribute of the lexer. It also requires that the nls_language for the session match. However, such things are only available for certain languages. It appears that only the auto_lexer supports Chinese, but as I said the index_stems attribute for the auto_lexer doesn't seem to work. You might try posting your problem on the OTN Text forum with a clear example. In the following, it shows what should work but does not, then what works for most other languages.

It is important to note that if you do not specify a wordlist, then by default it uses a basic_wordlist with stemmer set to English (or other language depending on your language settings). Also, the default value of index_stems for the auto_lexer is yes. Also, sometimes the auto_lexer does not accurately detect the correct language. This tends to happen with shorter documents. The longer the document, the more likely it will accurately detect the language.

Please note that I do not know Chinese and am relying on Google translate, so if I have something wrong that may be part of the problem. It might help if you provided a Chinese example of what you are searching for and expecting to find, but not finding. In my example, I have found that searching for '工程' finds both that and '工程师', but searching for '工程师' only finds itself.

-- test environment (table and data):
SCOTT@orcl12c> CREATE TABLE item
  2    (itemnum      NUMBER,
  3  	description  VARCHAR2(30))
  4  /

Table created.

SCOTT@orcl12c> INSERT ALL
  2  INTO item VALUES (1357851011, 'engineer')
  3  INTO item VALUES (1357851011, 'engineering')
  4  INTO item VALUES (1357851011, '工程师')
  5  INTO item VALUES (1357851011, '工程')
  6  SELECT * FROM DUAL
  7  /

4 rows created.


-- what theoretically should work, but does not:
SCOTT@orcl12c> BEGIN
  2    CTX_DDL.CREATE_PREFERENCE ('test_lex', 'AUTO_LEXER');
  3    CTX_DDL.SET_ATTRIBUTE ('test_lex', 'INDEX_STEMS', 'YES');
  4  END;
  5  /

PL/SQL procedure successfully completed.

SCOTT@orcl12c> CREATE INDEX item_description_idx
  2  ON item (description)
  3  INDEXTYPE IS CTXSYS.CONTEXT
  4  PARAMETERS
  5    ('LEXER	  test_lex')
  6  /

Index created.

SCOTT@orcl12c> SELECT itemnum, SCORE(1), description
  2  FROM   item
  3  WHERE  CONTAINS (description,' $engineer', 1) > 0
  4  AND    itemnum = 1357851011
  5  /

   ITEMNUM   SCORE(1) DESCRIPTION
---------- ---------- ------------------------------
1357851011         10 engineer

1 row selected.

SCOTT@orcl12c> SELECT itemnum, SCORE(1), description
  2  FROM   item
  3  WHERE  CONTAINS (description,' $engineering', 1) > 0
  4  AND    itemnum = 1357851011
  5  /

   ITEMNUM   SCORE(1) DESCRIPTION
---------- ---------- ------------------------------
1357851011          8 engineer
1357851011          8 engineering

2 rows selected.

SCOTT@orcl12c> ALTER SESSION SET NLS_LANGUAGE = 'SIMPLIFIED CHINESE'
  2  /

Session altered.

SCOTT@orcl12c> SELECT itemnum, SCORE(1), description
  2  FROM   item
  3  WHERE  CONTAINS (description,' $工程师', 1) > 0
  4  AND    itemnum = 1357851011
  5  /

   ITEMNUM   SCORE(1) DESCRIPTION
---------- ---------- ------------------------------
1357851011          5 工程师

1 row selected.

SCOTT@orcl12c> SELECT itemnum, SCORE(1), description
  2  FROM   item
  3  WHERE  CONTAINS (description,' $工程', 1) > 0
  4  AND    itemnum = 1357851011
  5  /

   ITEMNUM   SCORE(1) DESCRIPTION
---------- ---------- ------------------------------
1357851011          3 工程师
1357851011          3 工程

2 rows selected.

SCOTT@orcl12c> ALTER SESSION SET NLS_LANGUAGE = 'AMERICAN'
  2  /

Session altered.


-- what works for most other languages:
SCOTT@orcl12c> BEGIN
  2    CTX_DDL.DROP_PREFERENCE ('test_lex');
  3    CTX_DDL.CREATE_PREFERENCE ('test_lex', 'AUTO_LEXER');
  4    CTX_DDL.SET_ATTRIBUTE ('test_lex', 'INDEX_STEMS', 'NO');
  5    CTX_DDL.CREATE_PREFERENCE ('test_wordlist', 'BASIC_WORDLIST');
  6    CTX_DDL.SET_ATTRIBUTE ('test_wordlist', 'STEMMER', 'AUTO');
  7  END;
  8  /

PL/SQL procedure successfully completed.

SCOTT@orcl12c> DROP INDEX item_description_idx
  2  /

Index dropped.

SCOTT@orcl12c> CREATE INDEX item_description_idx
  2  ON item (description)
  3  INDEXTYPE IS CTXSYS.CONTEXT
  4  PARAMETERS
  5    ('LEXER	  test_lex
  6  	 WORDLIST test_wordlist')
  7  /

Index created.

SCOTT@orcl12c> SELECT itemnum, SCORE(1), description
  2  FROM   item
  3  WHERE  CONTAINS (description,' $engineer', 1) > 0
  4  AND    itemnum = 1357851011
  5  /

   ITEMNUM   SCORE(1) DESCRIPTION
---------- ---------- ------------------------------
1357851011          5 engineer
1357851011          5 engineering

2 rows selected.

SCOTT@orcl12c> SELECT itemnum, SCORE(1), description
  2  FROM   item
  3  WHERE  CONTAINS (description,' $engineering', 1) > 0
  4  AND    itemnum = 1357851011
  5  /

   ITEMNUM   SCORE(1) DESCRIPTION
---------- ---------- ------------------------------
1357851011          5 engineer
1357851011          5 engineering

2 rows selected.

SCOTT@orcl12c> ALTER SESSION SET NLS_LANGUAGE = 'SIMPLIFIED CHINESE'
  2  /

Session altered.

SCOTT@orcl12c> SELECT itemnum, SCORE(1), description
  2  FROM   item
  3  WHERE  CONTAINS (description,' $工程师', 1) > 0
  4  AND    itemnum = 1357851011
  5  /

   ITEMNUM   SCORE(1) DESCRIPTION
---------- ---------- ------------------------------
1357851011          5 工程师

1 row selected.

SCOTT@orcl12c> SELECT itemnum, SCORE(1), description
  2  FROM   item
  3  WHERE  CONTAINS (description,' $工程', 1) > 0
  4  AND    itemnum = 1357851011
  5  /

   ITEMNUM   SCORE(1) DESCRIPTION
---------- ---------- ------------------------------
1357851011          4 工程师
1357851011          4 工程

2 rows selected.

Re: Domain index >stemming concept [message #599199 is a reply to message #599181] Tue, 22 October 2013 23:42 Go to previous messageGo to next message
rameshaimhigh@gmail.com
Messages: 10
Registered: October 2013
Junior Member
Thanks for your timing help and to spend time to create scripts and make me to get clear. it is really very helpful.

You mean to say that, if this needs to work means, then we need to alter the session's nls_language right?

Is there any common nls_language in session will support all languages?

Re: Domain index >stemming concept [message #599207 is a reply to message #599199] Wed, 23 October 2013 00:34 Go to previous messageGo to next message
Lalit Kumar B
Messages: 3174
Registered: May 2013
Location: World Wide on the Web
Senior Member
rameshaimhigh@gmail.com wrote on Wed, 23 October 2013 10:12
Is there any common nls_language in session will support all languages?



Tha language component of the NLS_LANG specifies conventions such as the language used for Oracle messages, sorting, day names, and month names. Each supported language has a unique name; for example, AMERICAN , FRENCH , or GERMAN . The language argument specifies default values for the territory and character set arguments. If the language is not specified, then the value defaults to AMERICAN. So, each of the unique language specified will thus act uniquely.

Have a look at this link, http://www.oracle.com/technetwork/database/globalization/nls-lang-099431.html

[Updated on: Wed, 23 October 2013 00:34]

Report message to a moderator

Re: Domain index >stemming concept [message #599225 is a reply to message #599207] Wed, 23 October 2013 02:08 Go to previous messageGo to next message
rameshaimhigh@gmail.com
Messages: 10
Registered: October 2013
Junior Member
Thanks Lalit kumar!!!! I understood now.
Re: Domain index >stemming concept [message #599343 is a reply to message #599199] Wed, 23 October 2013 14:54 Go to previous message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
rameshaimhigh@gmail.com wrote on Tue, 22 October 2013 21:42
Thanks for your timing help and to spend time to create scripts and make me to get clear. it is really very helpful.

You mean to say that, if this needs to work means, then we need to alter the session's nls_language right?

Is there any common nls_language in session will support all languages?



What I said is that, if it were some other supported language other than Chinese, then changing the nls_language for the session is one of the things that needs to be done. However, it doesn't work for Chinese, no matter what you do, even though the documentation says that it should. This is why I suggested that you post the question on the OTN forums, where the Oracle Text product manager and others frequently respond. Alternatively, if you have paid Oracle support, then you could use that. In either case, you need to provide them with a clear test case that demonstrates the problem, similar to what I posted.
Previous Topic: Oracle Text indexes Sync Memory
Next Topic: CONTAIN function syntax usage
Goto Forum:
  


Current Time: Thu Mar 28 04:25:28 CDT 2024