Home Dashboard Directory Help
Search

Semantic Search uses System Locale setting to look for stopwords in document stored in FileTables by MauriD


Status: 

Resolved
 as Won't Fix Help for as Won't Fix


5
1
Sign in
to vote
Type: Bug
ID: 753596
Opened: 7/13/2012 7:01:11 AM
Access Restriction: Public
2
Workaround(s)
view
1
User(s) can reproduce this bug

Description

When using FileTable to store documents, the language used to check if stopwords are present in the indexed documents is the one defined as System Locale and not the one defined in the full-text index.

The attached script shows the behavior (please note that in order to replicate the problem, System Locale MUST not be in English. Use Italian for Example.)
Details
Sign in to post a comment.
Posted by Matija Lah on 3/17/2013 at 10:09 AM
While this behavior makes perfect sense, it might be completely unexpected.

Please, make sure this is documented appropriately!
Posted by Microsoft on 3/6/2013 at 8:22 AM
This is by design. In the attached script the Text IFilter would emit the System Locale for the work chunks. And we go by whatever the IFilter emits.
The full-text index language will be used when we have a plain text column without the doctype column.
Posted by Matija Lah on 7/16/2012 at 3:18 PM
Of course, if the language used for system locale is not supported by statistical semantic search, semanticsdb is not populated (which is, however, correctly reflected in the full-text crawl log).
Sign in to post a workaround.
Posted by Matija Lah on 3/20/2013 at 8:54 AM
I've discussed a possible workaround in more detail in the following blog post:
http://milambda.blogspot.com/2013/03/sql-server-2012-filetables-text-files.html
Posted by Matija Lah on 3/17/2013 at 10:16 AM
Use a file format, which supports language/culture settings that can be picked up by the corresponding IFilter; for example, Microsoft Word, XML or even PDF.
File Name Submitted By Submitted On File Size  
stoplist-test-2-connect.sql 7/13/2012 4 KB