| Author |
Message |
|
|
I've done this in C a few times.
CSV is more robust, since it has quoting and double quote doubled as an escape.
In CSV, it's much more compact, nearly as tight as simple delimited text, to only quote fields with embedded comma (,), carriage return or line feed as the need for quotes is rare. The nominal CSV record separator is cr-lf, but it generally works with either in my (not very wide or Mac-ish) experience.
An option to break up the data into multiple files if they get to 65536 lines might be useful for Excel.
For delimited text, I usually let the user specify a field and a record delimiter of their choice, one byte each.
It might be good to provide some options for reducing the precision of floats converted to text, as they can get bulky. Of course, many DB allow you to convert to scalar in SQL functions and casts, so this is a bonus bit.
In general, options to change the field separator, row separator, and remove the padding, header, row count, visually oriented blank lines and prompts are nice to have, in addition to the existing option to set he padding character. Steal inspiration from the options of isql, bcp, SQL*Plus, freebcp and tsql.
A reverse option, to insert select from the input records, is a real nice companion, providing a data movement tool between DB. A meta-query specifies which substrings of which fields go where (target list) and which records are skipped (simple where clause; but don't expect it to behave in a join). Options to specify a commit limit are good, too. If you can limit commit on either count or oldest row age, it becomes suitable for a near-real-time loader of whatever dribbles down the pipe. Smaller transactions avoid long locks, allow load restarts and can run faster.
BTW, I saw there was a new JAR, and downloaded it but have not tested it barefoot. Did you fix the 'keeps writing to stdout closed pipe' bug (adding *.check_error() periodically to see output EOF, and at EOF to see all errors)? I mention it here because doing *.check_error() both on buffered data max age and buffered data amount is nice, to limit latency in near-real-time situations and with slow queries.
|
 |
|
|
I use -spacer '^A' to differentiate spaces in data from space padding, and then do something like this:
tr -d '^A' <input_file | sed '
1,2d
4d
/^\xOD*$/d
$d
3s/^.*ColHeadOne/ColHeadOne/
s/"/""/g
s/ *| */","/g
s/\(.*\),"$/"\1/
' | sed '
$d
' > final_results.csv
I like to keep shell-isms on different lines than sed-isms, and have not found the -e necessary yet.
Note that CSV can tolerate embedded " only if they are doubled.
Last I checked, MS Access ignored CSV quoting, so I use tab, ^A or ~ delimited. Tab is hard to enter in GUI clients, ^A is a FIX separator, and ~ is the most unpopular character you can see.
This is still vulnerable to line feeds in column values, of course.
But the whole subject begs the question "Why not have a query output that option be in csv (and other simple delimited text), and make the header line optional?"
|
 |
|
|
Taking my cue from text handling products like MS ODBC Text, J-Stels, HXTT:
There should be an option for Header supression: yes/no
The delimiter choice should support escapes allowing any charater or string of characters. You should be able to set it whether you are generating delimited or rectangular output.
When making a delimited file, there should be no underlining or overlining and, optionally, one or no header line of column names. If names are requested, any blank labels should be populated with default values like COL_#.
The leading and trailing row column delimiters never appealed to me, although I guess they form a pretty box. Maybe they could be optional.
Padding the column delimiters with spaces never appealed to me, although I guess they give extra visual separation. Maybe they could be optional.
Headings on longvarchar are currently right justified, should probably be right for integers, centered for scalar and float presentations, left for char types.
Always but especially when generating delimited output, the user should be able to control NULL handling, by optionally supplying a desired string for NULL presentation in lieu of empty, by optionally asking for trimming, by optionally ensuring no char column is trimmed to empty when blank when empty is the NULL presentation.
BTW, columns are being truncated to 15 positions for varchar(-1), even with -w 50. Also with longvarchar(20), even with -w 50, but that's a bit less your problem.
Best regards,
David
|
 |
|
|
|
|