Data Quality Services / DQS Cleansing Component performance too slow - by BI Monkey

Status : 

  Fixed<br /><br />
		This item has been fixed in the current or upcoming version of this product.<br /><br />
		A more detailed explanation for the resolution of this particular item may have been provided in the comments section.

Sign in
to vote
ID 713837 Comments
Status Closed Workarounds
Type Suggestion Repros 0
Opened 12/13/2011 4:47:23 PM
Access Restriction Public


When using the DQS Cleansing component in conjunction with a Knowledge base, it takes too long to process even relatively small amounts of data (for more details see

For example processing 5 simple integer Domains / columns for 10,000 rows takes over a minute. 

This is not scalable and could not be used for Data Warehouse loads.
Sign in to post a comment.
Posted by BI Monkey on 2/29/2012 at 2:00 AM
OK, i've road tested this, results can be found here:

So, the upshot is I can get about a 3x performance boost using the proposed tweaks, which is good - but that even tuned it's still 5x slower than the client.
Posted by Microsoft on 1/31/2012 at 12:52 AM

To summarize the issue (as discussed by emails) - there are several things that can be done in order to improve the component performance:

The DQS chunk size should be changed to 10K. In order to change it:
- Change DataQualityParameterFlow XML in A_CONFIGURATION table (in the DQS_MAIN database).
- The XML has only one parameter, DCChunkSize. Change it to 10000.

In addition, it's possible to parallelize the work, by using several components instead of just one (our recommendation is 4). In this case, the following should be done:
- Inside the SSIS data flow task solution, right click anywhere in the background, and choose properties
- Change the DefaultBufferMaxRows attribute to 40000 instead of the default 10000
- It might also be needed to change the DefaultBufferSize attribute – the default is 10MB, which may or may not be enough for 40000 records, depending on the size of a record.

We plan to change the default chunk size accordingly at the first opportunity (however, not for RTM).

Thanks for your help,
Posted by BI Monkey on 1/5/2012 at 12:58 AM
Omer. From digging in the SSIS logs - what would be the immediately obvious performance booster for the component would be to make the rows sent to the DQS Server configurable instead of the currently fixed 1,000 rows - with a default of perhaps 10k rows - about the number of rows SSIS normally processes in a batch anyway. This would give the DQS Server a bigger chunk to work with and improve performance.
Posted by Microsoft on 12/25/2011 at 2:09 AM
Hi BI Monkey,

Thanks for analyzing the SSIS cleansing component performance. I would be happy to discuss this a little bit more, if you want, and I would be interested to know what are your performance expectations. My email is .

Anyway, some comments:
1. Since the component communicates with the DQS server, I'm not sure if comparing it to most other SSIS components is relevant.
2. That said, we are aware to some performance issues with the component. For example, our interactive cleansing project (available from the client) works faster than the SSIS component cleansing. This is due to better parallel optimization in the interactive cleansing case. We hope to improve this aspect in SSIS in the near future.
3. One workaround (although not the most convenient one) with the current version, is to have multiple components in the same data flow, and load balance the work between them. Given you have the appropriate hardware, this would improve the performance.

Omer Boker
PM, Data Quality Services
Posted by Phaneendra Babu Subnivis on 12/20/2011 at 11:25 PM
I think this is one of the thing that needs quick improvement. We were trying to tweek around but invain. Looking at the richness that MDS and DQS evolved from what has been offered compared to SQL Server 2008 R2, addressing this would become key and enhance possibility of adopting this in applications.