KOffice vs. OpenOffice

Steve Lamb grey at dmiyu.org
Sat Oct 4 03:10:29 BST 2008


P Kapat wrote:
> In such large scale situations, shell scripts
> become useless. Too many parameteres to take care of while writing
> such a script - too much time investment! So, I use, OO.o Calc. Not
> really clean but more than sufficient for my purpose.

    Erm.  First off if shell can't handle it I'd call that a failing of shell
and a poor choice to use it in the first place.  Secondly it is hardly ever a
poor time investment to script some tedious task to automate it to fit your
needs exactly.

#/usr/bin/python
import csv
csvfile = 'col.csv'
csvout = 'row.csv'
rows = csv.reader(open(csvfile,"b"))
for row in rows:
    csvrows = row
    max_len = len(row)
newcsv = []
for len in range(max_len - 1):
    makerow = []
    for row in csvrows:
        try:
            makerow.append(row[len])
        except:
            makerow.append('')
    newcsv.append(makerow)
outfile = csv.writer(open(csvout,"bw"))
for row in newcsv:
    outfile.writerow(row)

    Probably has bugs because it took me about 10m to write that completely
off the top of my head.  It could use some slight improvements but it is a
proof-of-concept.  Another 5 minutes and it could have a default set of files
to transform or take a command-line argument and transform that single file.
This will handle any CSV you throw at it and should, barring any
off-the-cuff-by-memory bugs transform it from columns to rows (and back again!).

    Is this something that everyone can do presently?  No, of course not.  Did
I spend time learning Python?  Of course!  But is that time wasted?  Hardly.

> And what is the "philosophy" behind requiring data in rows as opposed
> to columns?

    In writing the above script I can see a possible reason.  If we're talking
an array of arrays the X axis (rows) are a single array while the Y access is
a cross-section of elements from multiple arrays.

> Lastly, this issue plagues every spreadsheet application on the planet
> except MS Excel 2007 (it existed in XP/2003 version too, but they
> fixed it in the "fancier" 2007 version ): Hard restrictions on the
> number of rows and columns. I regularly deal with csv files with more
> than 100K rows. All spreadsheet apps (OO.o Cacl, KSpread, ....) have a
> upper bound of 65536 (= 2^16) rows. There is some low level machine -
> optimization reason to it, which I am not entirely aware of.

    No.  There is a high level one; it's called a database.  Generally
speaking when you handling more than a couple thousand elements you're better
of learning the basics of a database and using it for storage/retrieval of
data instead of a flat text file.  I believe the thinking is that if you're
going to use more than 16 bits worth of storage you're on a large enough
project to warrant using a real database instead of an ad-hoc one.  But...

> Of course, these are technical data and a spreadsheet is not the right
> application for this. I know, I use R/Matlab for data analysis and
> PostgreSQL for storing these data. But hey I had to invest time behind
> setting up a postgres server and learning some elementary sql to just
> "see" the data.

...you knew that.  Yes, you had to put in some time to "see" the data but the
time saved from the proper application of that knowledge is return to you many
magnitudes over.

-- 
         Steve C. Lamb         | But who can decide what they dream
       PGP Key: 1FC01004       |      and dream I do
-------------------------------+---------------------------------------------

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/kubuntu-users/attachments/20081003/ed39ff63/attachment.pgp 


More information about the kubuntu-users mailing list