[hotjoe.com] HotJoe Java Help Forums
  [Search] Search   [Recent Topics] Recent Topics   [Hottest Topics] Hottest Topics   [Members]  Member Listing   [Groups] Back to home page 
[Register] Register / 
[Login] Login 
Visit java.com
Please send email if you are having login problems - see the posts below for more info.
Hotmail and Yahoo! users - please see the Hotmail post or the Yahoo! post for information on lost emails.
sed script to produce csv file  XML
Forum Index » Jisql
Author Message
hansley

Newbie

Joined: 07/30/2008 14:20:57
Messages: 1
Offline

I had the need, so just hacked this out and wanted to save others the pain. These two chained sed scripts will delete the header and footer rows, clean out the 1> bits, get rid of the line between the header and first data row, get rid of all extra whitespace, change pipes to ",", add " to the front of each line, and clean up ugly line endings. The only thing to change is to change ColHeadOne to whatever your first column header is. It dependes on your jisql output being named 'interim_results.csv'.

sed -e 's/^\x0D$//g' \
-e '/^$/d' \
-e '1,2d' \
-e '4d' \
-e '3s/^.*ColHeadOne/ColHeadOne/g' \
-e 'N;$!P;$!D;$d' < interim_results.csv | \
sed -e 's/ */ /g' \
-e 's/ *| */","/g' \
-e 's/^ */"/g' \
-e 's/,"*$//g' > final_results.csv

Enjoy. No warranties, expressed or implied. Your mileage may vary. No whales were harmed in the production of this script. Please recycle when done with it.
-Bill

DGPickett

Newbie

Joined: 01/04/2006 10:24:00
Messages: 3
Offline

I use -spacer '^A' to differentiate spaces in data from space padding, and then do something like this:

tr -d '^A' <input_file | sed '
1,2d
4d
/^\xOD*$/d
$d
3s/^.*ColHeadOne/ColHeadOne/
s/"/""/g
s/ *| */","/g
s/\(.*\),"$/"\1/
' | sed '
$d
' > final_results.csv

I like to keep shell-isms on different lines than sed-isms, and have not found the -e necessary yet.

Note that CSV can tolerate embedded " only if they are doubled.

Last I checked, MS Access ignored CSV quoting, so I use tab, ^A or ~ delimited. Tab is hard to enter in GUI clients, ^A is a FIX separator, and ~ is the most unpopular character you can see.

This is still vulnerable to line feeds in column values, of course.

But the whole subject begs the question "Why not have a query output that option be in csv (and other simple delimited text), and make the header line optional?"
[Email]
stdunbar

Newbie
[Avatar]

Joined: 06/22/2005 14:51:37
Messages: 703
Location: Superior, CO, USA
Offline

I think you or someone else had suggested that before. Jisql hasn't gotten the attention it needs recently so let me see what I can do. This sounds pretty straight forward.

Thanks for using the forums at hotjoe.com
[WWW] [Yahoo!] [ICQ]
DGPickett

Newbie

Joined: 01/04/2006 10:24:00
Messages: 3
Offline

I've done this in C a few times.

CSV is more robust, since it has quoting and double quote doubled as an escape.

In CSV, it's much more compact, nearly as tight as simple delimited text, to only quote fields with embedded comma (,), carriage return or line feed as the need for quotes is rare. The nominal CSV record separator is cr-lf, but it generally works with either in my (not very wide or Mac-ish) experience.

An option to break up the data into multiple files if they get to 65536 lines might be useful for Excel.

For delimited text, I usually let the user specify a field and a record delimiter of their choice, one byte each.

It might be good to provide some options for reducing the precision of floats converted to text, as they can get bulky. Of course, many DB allow you to convert to scalar in SQL functions and casts, so this is a bonus bit.

In general, options to change the field separator, row separator, and remove the padding, header, row count, visually oriented blank lines and prompts are nice to have, in addition to the existing option to set he padding character. Steal inspiration from the options of isql, bcp, SQL*Plus, freebcp and tsql.

A reverse option, to insert select from the input records, is a real nice companion, providing a data movement tool between DB. A meta-query specifies which substrings of which fields go where (target list) and which records are skipped (simple where clause; but don't expect it to behave in a join). Options to specify a commit limit are good, too. If you can limit commit on either count or oldest row age, it becomes suitable for a near-real-time loader of whatever dribbles down the pipe. Smaller transactions avoid long locks, allow load restarts and can run faster.

BTW, I saw there was a new JAR, and downloaded it but have not tested it barefoot. Did you fix the 'keeps writing to stdout closed pipe' bug (adding *.check_error() periodically to see output EOF, and at EOF to see all errors)? I mention it here because doing *.check_error() both on buffered data max age and buffered data amount is nice, to limit latency in near-real-time situations and with slow queries.
[Email]
 
Forum Index » Jisql
Go to:   
Powered by JForum 2.1.8 © JForum Team
This site built by Scott Dunbar of Xigole Systems. © 2005-2010 - Scott Dunbar
Java and the Java Get Powered logo are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries.
hotjoe.com and xigole.com have no affiliation with Sun Microsystems, Inc.