Home Dashboard Directory Help
Search

Incorrect file IO when parsing UNIX created files by francisco_jorge


Status: 

Closed
 as Deferred Help for as Deferred


2
0
Sign in
to vote
Type: Bug
ID: 777705
Opened: 1/28/2013 1:00:36 PM
Access Restriction: Public
0
Workaround(s)
view
1
User(s) can reproduce this bug

Description

Although Visual Studio should recognize and correctly parse text files created in UNIX (with lines ending in only '\n' and not '\r\n'), this is not the case.

If the user is not aware of this issue, saving and restoring position pointers (like the ones used by "fgetpos/fsetpos" and "ftell/fseek") can irreversibly corrupt the contents of such files.

Bellow is a simple source code and two example files demonstrating this issue.
(Also sent as an attachment to this bug report.)
Details
Sign in to post a comment.
Posted by Microsoft on 2/17/2013 at 6:23 PM
Hello,

Thank you for reporting this issue. It is partially a bug and partially a known limitation of the Visual C++ CRT implementation. Our text-mode stream processing assumes that the text being read has Windows platform newlines (\r\n). Unfortunately, we will be unable to fix this for the next release of the Visual C++ CRT. While we would like to fix every bug, we are unable to do so due to time and resource constraints. We will keep this bug in our backlog and consider fixing it in a future release.

As a possible workaround, consider ingesting the file as binary data and performing newline translation after reading the file if the file may contain non-Windows newlines (\n without a preceding \r). Or, avoid use of the ftell and fseek families of functions when working with text files.

Note: Connect doesn't notify me about comments. If you have any further questions, please feel free to e-mail me.

James McNellis
Visual C++ Libraries
james.mcnellis@microsoft.com
Posted by Microsoft on 1/28/2013 at 7:21 PM
Thanks for your feedback.

We are rerouting this issue to the appropriate group within the Visual Studio Product Team for triage and resolution. These specialized experts will follow-up with your issue.
Posted by XICO2KX on 1/28/2013 at 6:26 PM
A little more information about what's going on internally (char read and stored fpos_t value):

[filetest_win.txt] (39 bytes)
>,A,\n,1,\n,2,\n,3,\n,>,B,\n,1,\n,2,\n,3,\n,>,C,\n,1,\n,2,\n,3,\n,
1,2,4,5,7,8,10,11,13,14,15,17,18,20,21,23,24,26,27,28,30,31,33,34,36,37,39,

[filetest_unix.txt] (27 bytes)
>,A,\n,1,\n,2,\n,3,\n,>,B,\n,1,\n,2,\n,3,\n,>,C,\n,1,\n,2,\n,3,\n,
-11,-10,-8,-7,-5,-4,-2,-1,1,2,3,5,6,8,9,11,12,14,15,16,18,19,21,22,24,25,27,

Strangely, when reading UNIX files, the file position starts with a negative value, and it incorrectly assumes there's also an (invisible) '\r' character before each '\n', hence the missing values in the sequence.
Posted by Microsoft on 1/28/2013 at 1:50 PM
Thank you for your feedback, we are currently reviewing the issue you have submitted. If this issue is urgent, please contact support directly(http://support.microsoft.com)
Sign in to post a workaround.
File Name Submitted By Submitted On File Size  
testfiles.zip 1/28/2013 1 KB