Home Dashboard Directory Help
Search

Flat File Parser cannot import files with embedded text qualifiers by ManServ


Status: 

Active


105
0
Sign in
to vote
Type: Suggestion
ID: 312164
Opened: 11/25/2007 4:20:25 AM
Access Restriction: Public
5
Workaround(s)
view

Description

Flat File Parser isn't able to import files with embedded text qualifiers ("). This has been tested with all public SQL Server versions from 2005 up to 2008 R2 (longer than 5 years, three releases customers payed for) and according to Microsoft employee comments this isn't going to be implemented in service packs.
bcp.exe / BULK INSERT don't support qualified text at all.

Export the following table with the flat file parser and you'll see that it's not able to reimport it's own exported data:

CREATE TABLE [dbo].[FlatFileParserTest](
    [ID] [bigint] IDENTITY(1,1) NOT NULL,
    [TextFiled1] [nvarchar](max) NULL,
    [TextField2] [nvarchar](max) NULL,
    [Textfield3] [nvarchar](max) NULL,
    [Textfield4] [nvarchar](max) NULL,
    [Textfield5] [nvarchar](max) NULL,
    [Textfield6] [nvarchar](max) NULL,
    [Textfield7] [nvarchar](max) NULL,
CONSTRAINT [PK_FlatFileParserTest] PRIMARY KEY CLUSTERED ([ID] ASC))
INSERT INTO FlatFileParserTest (TextFiled1, TextField2, Textfield3, Textfield4, Textfield5, Textfield6, Textfield7) VALUES (N'1 inch', N'2"', N'3 inches', N'4, 5, and 6"', N'7 inches', N'8",9",10""', N'11 inches')

Also see:
https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=239415 (SQL 2008 Flat File Provider still broken)
https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=239415 (Flat file provider should support double qualified text import)
https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=267529 (Embedded quotes in Flat File Import fails)
https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=282396 (SSIS flat file parser does not read Column delimiters embedded in text data)
https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=127114 (Flat File Source incorrectly parsing data when using Comma as the column delimiter)

A lot of example data is provided by Microsoft itself with the MSDN glossaries: ftp://ftp.microsoft.com/developr/msdn/newup/glossary/README.TXT (now available in the MSDN subscribers download section)

This is pathetic.
Details
Sign in to post a comment.
Posted by Stimo on 4/13/2012 at 7:07 AM
>>Posted by ChiefSoftwareNoobie on 12/06/2009 at 11:06
>>Is this STILL not corrected? My God.

Nope..
Posted by fatveg on 2/23/2012 at 10:05 AM
I am really amazed that something like this hasn't been fixed. Unfortunately I am having to import the files I have into MySQL instead -- NOT my preferred work-around!
Posted by thisisfutile1 on 10/20/2011 at 8:35 AM
Isn't "text qualifier" a database-101 concept.

"In the beginning, Codd created the relational database. Now the database was empty. Codd said, "let there be flat files" the first day. Codd said, "let there be delimited values" the second day. Codd said, "let those values include the delimiter if the user so chooses" the third day.

OK, maybe it's really database-103 ...

... WHY DOESN'T THIS WORK?!?
Posted by darchangel on 7/29/2011 at 11:00 PM
This is really pathetic. I don't wanna put many workarounds on my package, so please fix this.
Posted by iPeel on 5/31/2011 at 4:06 AM
In all the years I have been working with Microsoft SQL Server I have never been able to successfully import a text file using the "Import" function of management Studio. This is simply unacceptable. Add to that any errors produced rely on you having direct access to the developer who wrote it and you have an unmanageable feature.

How come BCP does a better job and is something like 15 years old?!

What the hell do Microsoft QA use to test Import functionality given I have never successfully imported any form of text file? CSV files and their dirivatives are the industry standard mechanisms for transferring files from one system to another. If this stuff was free I would understand, but the sheer costs of SQL Server gives you the assumption that you are geeting something for your money, or at least a fix for something so fundamental sometime in the next five years.
Posted by cameron_eldridge on 5/5/2011 at 6:50 PM
So as this issue is about 3 and a half years old now, I have to wonder what on earth Microsoft are doing? It's not like this is a complicated problem, they have products that handle this correctly already (Excel, JET DB drivers), if the SSIS team find it so hard to handle text files, why don't' they go ask the Excel guys how to do it? They seem to be able to actually implement simple features.
Posted by IHiJump on 4/15/2011 at 7:09 AM
Totally Absurd this is not supported and hasn't been fixed. Microsoft is so good at giving you 95% of what you need and then forcing you to spend 95% of your time trying to work around their idiocy because they didn't include one simple (and absolutely necessary) piece of functionality.
Posted by pjg2000 on 1/12/2011 at 12:27 PM
Really Microsoft. What's up with this? Please fix!
Posted by P. Sherman on 11/1/2010 at 8:57 AM
It's pretty ridiculous that this is not supported. What's even more ridiculous though is that the "Text Qualifier" option appears in the Flat File Connection Manager Editor and on the msdn help page http://msdn.microsoft.com/en-us/library/ms187017(SQL.90).aspx, making it appear as if it is supported. Yet another example of Microsoft playing a bait-and-switch on it's users.
Posted by TruthBeTold on 8/19/2010 at 2:32 PM
I have experience with a plethora of technologies both Microsoft and non-MS. One of the most annoying has been SSIS. My experience with SQL Server has been great. SSRS experiences are, for the most part, been favorable. But SSIS has been frustrating at BEST and and running down the street screaming "I'm Mad as Hell at wasting so much time for something that should have been simple" at other times. There are just so many things wrong or missing or just plain stupid. It makes you wonder 'What were they drinking when they were designing it?" MS keeps saying they are ready for the enterprise, for the corporate environment but this clearly tells me they still don't have a clue. In a large multi-national corporation you will come in contact with a variety of technologies many of which will not exchange data with each other. Exchanging files in a comma (tab, |) delimited, double quoted text fields has been around for a very long time. The additional complication these days are Unicode. In an ETL product the idea that you would not support this is astounding. MS lets get with the program!
Posted by Joseph Leathlean on 5/28/2010 at 5:40 AM
Well - they are now stating that it is again fixed in the 'next major release' - see https://connect.microsoft.com/SQLServer/feedback/details/560592/flat-file-connection-manager-not-handling-text-delimiters-in-csv-files#tabs
Posted by Joseph Leathlean on 5/20/2010 at 11:52 PM
SQL 2008 R2 RTM also still does not handle this correctly - submitted a new bug: https://connect.microsoft.com/SQLServer/feedback/details/560592/flat-file-connection-manager-not-handling-text-delimiters-in-csv-files

Please go and vote on it...
Posted by ManServ on 4/4/2010 at 1:36 PM
Microsoft posted an update that this will be fixed in the next "major" version in the comments section: https://connect.microsoft.com/SQLServer/feedback/details/281398/sql-2008-flat-file-provider-still-broken
Posted by GENEXT on 1/20/2010 at 11:24 PM
This is unacceptable.
May be in the next versions there will be even less features available?

Nowadays everybody is working in multi-platform environments.
At least put it on the "box" saying"those features/capabilities,.... .....are not available anymore".

Problems even for a simple export ??
This is how you simply look for alternatives in tools and software from other companies.

Will this simple issue be corrected "today!!" or never?!!!?
Posted by PublicProfile22490 on 1/20/2010 at 8:09 AM
I agree that this is very important. Importing a CSV file where there are both commas and quotation marks in my data is a standard condition. A tool whose purpose is to facilitate the design of import packages should be able to accomodate this situation, not force me to write a bunch of code to process the file. I don't have time for that or to study other workarounds.
Posted by Grontoblios on 12/15/2009 at 9:25 AM
Add my vote. I spent half an hour wondering what I was doing wrong, without considering the possibility that the SSIS 2008 Import/Export Wizard was too brain-damaged to do what I needed it to do.

An astonishing omission. Please fix it NOW. Workarounds are not sufficient. 2011 is not sufficient.
Posted by hbz2000 on 11/25/2009 at 2:25 PM
This is horrifying. We all know that the folks at Microsoft are not morons, but this is moronic, inexcusable, defective, broken, ...

FIX THIS! It's not that hard to do.
Posted by Daniel Adeniji on 11/13/2009 at 5:52 PM
BTW, as a follow-up to my earlier posting. The ability to export out embedded text qualifiers (") works in MS SQL Server 2008, but not in MS SQL Server 2005. And, the usage of Views is only helpful when out is exporting data from the DB to textfiles, not the import that us is being tracked here.
Posted by Daniel Adeniji on 11/13/2009 at 5:44 PM
I will like to add my vote for this 'bug' to be fixed as well. As MS SQL Server is used by many users who care so little about presence of Quotes in their data, especially ones that have millions of records having this bit of anonyance lessens the whole user experience. One workaround that seems to work is create a view that disposes of the double-quotes. But, obviously one ends creating a set of database objects that simply bandages \ hides the actual problem. And, even then, the ability to clutter up (by creating tracking objects) the original database is not always available.

create view dbo.v_FlatFileParserTest
as

select
        ID
        , replace(TextFiled1, '"', '') as TextField1
        , replace(TextField2, '"', '') as TextField2
        , replace(TextField3, '"', '') as TextField3
        , replace(TextField4, '"', '') as TextField4
        , replace(TextField5, '"', '') as TextField5
        , replace(TextField6, '"', '') as TextField6
        , replace(TextField7, '"', '') as TextField7
from dbo.FlatFileParserTest
Posted by WilliamH.GatesIII on 10/19/2009 at 8:03 AM
After the disaster that was SQL Server 2005's import tool, I was hoping that 2008 would be better -- and it is, but it would have been hard not to be. HOWEVER, not being able to import a basic flat file with embedded delimiters is stunning. I had to dig up an old copy of SQL Server 2000's DTS Import / Export Wizard to import this file.

The DTS Wizard is over 9 years old. How about making that tool available for download to work around 2008's severe lack of functionality in this area, even with the Unicode/UTF8 caveats that you will no doubt mention as the reason for the "new and improved" SSIS in 2005.

Thank you.
Posted by Microsoft on 9/16/2009 at 2:29 PM
SQL Server 2008 R2 is a point release which has very few Integration Services bug fixes and no functionallity changes. We hope to address this in our next major release is planned for the 2nd half of 2011.

- Matt Carroll (SSIS Dev Lead)
Posted by ManServ on 8/11/2009 at 11:36 AM
SQL Server 2008 R2 August CTP still isn't capable of importing RFC 4180 compliant csv files. Also Unicode supplementary characters and line breaks aren't supported with UTF8 files.
Posted by ChiefSoftwareNoobie on 6/12/2009 at 11:06 AM
Is this STILL not corrected? My God.
Posted by taok on 4/8/2009 at 7:28 AM
this has to be the strangest, most un-justifiable SQL Server bug ever! An ETL tool that cannot handle the SIMPLEST file format there is out there???
Posted by BetterToday on 8/27/2008 at 4:29 AM
"This issue doesn’t currently meet the requirements for correction in SQL 2008." *rofl* !!!

This bug renders the Flat File Connection Manager feature useless. If I have to write my own source/destination filter, then what's the benefit of SSIS anyway??
Posted by Microsoft on 8/26/2008 at 3:57 PM
This is a duplicate of another issue and we are planning to look address it in the next major release. In the meanwhile, we have created two samples on http://www.codeplex.com/SQLSrvIntegrationSrv that may be helpful: RegExFlatFileSource and DelimitedFileReader.
Posted by Catadmin on 8/26/2008 at 3:41 AM
I agree that this is severely limiting. I'm in a time crunch to get these packages converted due to a lot projects at our work place and an "end of lease" machine. Since the files are generated by third party software, I can't change the file format either. What's the use of having "Text Qualifier" listed in the Flat File Connector if we can't use it? If you're going to have this as an option on the connector, you can't just tell us to "fix the problem before OR after the import". If Microsoft is truly concerned about customer service, this issue should be fixed so we can auto-convert the data. The point of SSIS was, I thought, to make this sort of stuff easier on the developers and DBAs. Not harder.

My time is very valuble and it's a shame I might have to resort to yet another third party tool to get my data converted and put into SQL Server. At that point, I might as well convert all my databases to MySQL or some other database product and just use third party software to get all my data transformed as needed. Please fix this.
Posted by ManServ on 8/7/2008 at 4:09 PM
full RFC 4180 compatibility appreciated
Posted by Microsoft on 3/6/2008 at 1:30 PM
Thank you for your feedback and support of the SSIS product. This issue doesn’t currently meet the requirements for correction in SQL 2008. If you would like for us to consider this issue for correction after SQL 2008i RTM, please re-open or add your vote.

Again we appreciate your time and support.

-SSIS team
Posted by ManServ on 2/21/2008 at 12:32 AM
I hope it will be more useful than the already existing "UnDouble": http://www.microsoft.com/downloads/details.aspx?familyid=b51463e9-2907-4b82-a353-e15016486e1d&displaylang=en
Posted by Microsoft on 2/20/2008 at 3:25 PM
Even though we are unable to address this for our upcoming Katmai release, we are planning to provide package samples that use script component to parse flat file data that contains column delimiters as text data.

We will update you once these samples are either in a released CTP or are available on CodePlex.com.

Thanks,
SSIS Team
Sign in to post a workaround.
Posted by Rakesh Mishra on 5/1/2012 at 12:20 PM
The Flat File Source in SSIS 2012 supports embedded qualifiers and a varying number of columns. Good time to switch to SQL 2012.
Posted by The Oracle on 1/5/2012 at 8:11 AM
Here is a technique I worked out, based on inspiration by another source (which I cannot remember, sorry). This will handle embedded commas, using TSQL. You can see that "trailing" fields without quotes may end up clustered. These can be further parsed using additional CTEs, and the results related on ID:

DECLARE @Test    TABLE
(
    ID    INT IDENTITY
    , CharString    VARCHAR(500)
);

--No embedding
INSERT @Test (CharString) SELECT '{7C81B50A-A41D-468E-94DA-8BAB73F51169},"12345","Miss","Doe","Jane","","",1986-02-27,True' + '"';
--Embedded comma
INSERT @Test (CharString) SELECT '{1F147F09-0590-4A25-A031-7C96558F19A6},"12346","Mr.","Doe, Jr.","Samuel","John","",1986-02-27,True' + '"';

SELECT
    *
FROM
    @Test;


;WITH stringCTE AS
(
    SELECT
        CAST( 'Anchor' AS VARCHAR(50)) AS WhereFrom
        , t.id
        , t.CharString
        , 1 AS startPos
        , ISNULL( NULLIF( CHARINDEX( '"', t.CharString), 0) -1, LEN( t.CharString)) AS stopPos
        , 1 AS ColIndex
    FROM
        @Test t

    UNION ALL

    SELECT
        CAST( 'Recursive' AS VARCHAR(50)) AS WhereFrom
        , c.id
        , c.CharString
        , c.stopPos + 1 AS startPos
        , CHARINDEX( '"', c.CharString, c.stopPos + 1) AS stopPos
        , ColIndex + 1 AS ColIndex
    FROM
        stringCTE c
    WHERE
        CHARINDEX( '"', c.CharString, c.stopPos + 1) > 0
    
)
, ColumnList AS (
SELECT
    c1.ID
    , c1.ColIndex
    , SUBSTRING( c1.CharString, startPos, stopPos-startPos) AS Val
FROM
    stringCTE c1
WHERE
    stopPos > startPos
)
, Table1 AS (
------------------------------------------------------------------
/* Comment portion below if you want values in separate rows. */
--VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV
SELECT
    PVT.ID
    , [1] AS ReplicationId
    , [3] AS ExternalId
    , [5] AS NamePrefix
    , [7] AS LastName
    , [9] AS FirstName
    , [11] AS MiddleName
    , [14] AS Cluster1
FROM
(
--^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    SELECT    
        *
    FROM
        ColumnList c1
    WHERE
        1=1
        AND Val <> ','
------------------------------------------------------------------
/* Comment portion below if you want values in separate rows. */
--VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV
) AS P
PIVOT (
    MIN( P.Val)
    FOR P.ColIndex IN ([1],[3],[5],[7],[9],[11],[14])
) AS PVT
--^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
SELECT
    *
FROM
    Table1;


Posted by mosaic on 11/16/2011 at 9:43 AM
Instead of using SSIS, you may be able to use a query to read the source data.
The file needs to be on the same server

TRUNCATE TABLE myTestTable
INSERT INTO myTestTable
SELECT * FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0'
,'Text;CharacterSet=Unicode;Database=C:\temp\Flat-file export\data\;'
,'SELECT * FROM [test.csv]') AS X

-- or:

DROP TABLE myTestTable
SELECT *
INTO myTestTable
FROM
(SELECT * FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0'
,'Text;CharacterSet=Unicode;Database=C:\temp\Flat-file export\data\;'
,'SELECT * FROM [test.csv]')
) AS X
Posted by ManServ on 5/8/2008 at 3:09 PM
Hello sufc,
unfortunately your workaround doesn't work reliably:

Input:
"Global Flag must be one of ""BATTERYICON"", ""MULTIBATTERY"",""RESUMEPASSWORD"", ""WAKEONRING"", ""VIDEODIM""\n",,"글로벌 플래그는 ""BATTERYICON"", ""MULTIBATTERY"",""RESUMEPASSWORD"", ""WAKEONRING"", 또는 ""VIDEODIM""이어야 합니다.\n",,TXT,,All,Windows

Output:
"Global Flag must be one of 'BATTERYICON', 'MULTIBATTERY"",""RESUMEPASSWORD', 'WAKEONRING', 'VIDEODIM'\n",,"글로벌 플래그는 'BATTERYICON', 'MULTIBATTERY"",""RESUMEPASSWORD', 'WAKEONRING', 또는 'VIDEODIM'이어야 합니다.\n",,TXT,,All,Windows Server 2003 R2

with:
        Dim strTemp As String = My.Computer.FileSystem.ReadAllText("Input.txt")
        Dim re As System.Text.RegularExpressions.Regex = New System.Text.RegularExpressions.Regex("(?<!"",)""""(?!,"")", System.Text.RegularExpressions.RegexOptions.Multiline)
        My.Computer.FileSystem.WriteAllText("Output.txt", re.Replace(strTemp, "'"), False, New System.Text.UnicodeEncoding)

For now I use the Microsoft LogParser (http://www.microsoft.com/downloads/details.aspx?FamilyID=890cd06b-abf8-4c25-91b2-f8d975cf8c07) although it doesn't support nvarchar(max)
Posted by Mark Guinness on 4/10/2008 at 11:41 AM
You can use a regular expression to identify and replace embedded quotes in the source text file, using a script component in SSIS:

Dim fs As String = File.ReadAllText(fn)
Dim re As Regex = New Regex("(?<!"",)""""(?!,"")", RegexOptions.Multiline)
File.WriteAllText(fn, re.Replace(fs, "'"))
File Name Submitted By Submitted On File Size  
flat file with embedded text qualifiers and headers - Unicode UTF-16 with BOM format.txt (restricted) 11/25/2007 -