Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Gentoo User (http://www.linux-archive.org/gentoo-user/)
-   -   join two tab-separate-value files without join field (http://www.linux-archive.org/gentoo-user/94422-join-two-tab-separate-value-files-without-join-field.html)

Tim 05-24-2008 04:06 AM

join two tab-separate-value files without join field
 
Zhang Weiwu wrote:
> Hi.
>
> I got a datasheet from my colleague in MS Excel format and I intend to
> process that file with my awk/sed knowledge. The problem is: he sent me
> two Excel files each with 2134 records, in fact there should be only one
> excel file with 2134 rows and 295 columns, but MS Excel can only handle
> 256 data columns, so he split the datasheet vertically so he can manage
> to send to me.
>
> Now I saved both file to tab-separated-value format, how do I join them?
>
> I could have used join(1) but that require a join field, an ID of some
> sort. I think of this:
>
> $ grep -n ' left.tsv | sed 's/:/ /'> left.forjoin
> $ grep -n ' right.tsv | sed 's/:/ /'> right.forjoin
> $ join -t " " left.forjoin right.forjoin > result.tsv
> (note that for join's -t parameter somehow I need to manage to get a tab
> between the quotes)
>
> Yes I achieved what I want, but that looks complex. Is there a simpler
> way? Thanks in advance.
>
> I know OpenOffice 3.0 can handle up to 1024 data columns. It's difficult
> to convince anyone to switch to OOO because here in China MS Office
> costs only 0$. I also could use OOO3.0 for doing the join but I wish to
> know the commandline way:)
>
Got perl?

#!/usr/bin/perl

if($#ARGV < 1) {
print "Arguments: <file1> <file2>
";
exit(1);
}

open(FIRSTFILE, $ARGV[0]);
open(SECONDFILE, $ARGV[1]);
@first = <FIRSTFILE>;
@second = <SECONDFILE>;

$i = 0;
for($i = 0;$i < 2; $i++) {
$tmp1 = $first[$i];
$tmp1 =~ s/
//g;
$tmp2 = $second[$i];
$tmp2 =~ s/
//g;

$str = $tmp1 . " " . $tmp2 . "
";
print $str;
}

close(FIRSTFILE);
close(SECONDFILE);

This is likely not the best or fastest way to do it, and I don't have a
dataset as large as yours readily available for testing, but it seems to
work.

-Tim
--
gentoo-user@lists.gentoo.org mailing list

Zhang Weiwu 05-24-2008 04:18 AM

join two tab-separate-value files without join field
 
Hi.

I got a datasheet from my colleague in MS Excel format and I intend to
process that file with my awk/sed knowledge. The problem is: he sent me
two Excel files each with 2134 records, in fact there should be only one
excel file with 2134 rows and 295 columns, but MS Excel can only handle
256 data columns, so he split the datasheet vertically so he can manage
to send to me.

Now I saved both file to tab-separated-value format, how do I join them?

I could have used join(1) but that require a join field, an ID of some
sort. I think of this:

$ grep -n ' left.tsv | sed 's/:/ /'> left.forjoin
$ grep -n ' right.tsv | sed 's/:/ /'> right.forjoin
$ join -t " " left.forjoin right.forjoin > result.tsv
(note that for join's -t parameter somehow I need to manage to get a tab
between the quotes)

Yes I achieved what I want, but that looks complex. Is there a simpler
way? Thanks in advance.

I know OpenOffice 3.0 can handle up to 1024 data columns. It's difficult
to convince anyone to switch to OOO because here in China MS Office
costs only 0$. I also could use OOO3.0 for doing the join but I wish to
know the commandline way:)

--

Real Softservice

Huateng Tower, Unit 1788
Jia 302 3rd area of Jinsong, Chao Yang

Tel: +86 (10) 8773 0650 ext 603
Mobile: 135 9950 2413
http://www.realss.com

--
gentoo-user@lists.gentoo.org mailing list

Etaoin Shrdlu 05-24-2008 10:08 AM

join two tab-separate-value files without join field
 
On Saturday 24 May 2008, 06:18, Zhang Weiwu wrote:

> Hi.
>
> I got a datasheet from my colleague in MS Excel format and I intend to
> process that file with my awk/sed knowledge. The problem is: he sent
> me two Excel files each with 2134 records, in fact there should be
> only one excel file with 2134 rows and 295 columns, but MS Excel can
> only handle 256 data columns, so he split the datasheet vertically so
> he can manage to send to me.
>
> Now I saved both file to tab-separated-value format, how do I join
> them?

IIUC, you have two files like this:


file1:

col1,col2,col3, ... ,col256
(2134 rows)


file2:

col257,col258, ... ,col295
(2134 rows)


If my understanding is correct, then

paste -d, file1 file2

should do what you want.
Hope this helps.
--
gentoo-user@lists.gentoo.org mailing list

Etaoin Shrdlu 05-24-2008 10:11 AM

join two tab-separate-value files without join field
 
On Saturday 24 May 2008, 12:08, Etaoin Shrdlu wrote:

> If my understanding is correct, then
>
> paste -d, file1 file2

If fields are separated by tabs instead of commas, then it's even easier:

paste file1 file2

since "paste" will use <tab> as default delimiter.
--
gentoo-user@lists.gentoo.org mailing list

Zhang Weiwu 05-24-2008 03:28 PM

join two tab-separate-value files without join field
 
Etaoin Shrdlu wrote:
> On Saturday 24 May 2008, 12:08, Etaoin Shrdlu wrote:
>
>> If my understanding is correct, then
>>
>> paste -d, file1 file2
>
> If fields are separated by tabs instead of commas, then it's even easier:
>
> paste file1 file2
>
> since "paste" will use <tab> as default delimiter.

Hi. This is exactly what I am looking for. Thank you very much!
--
gentoo-user@lists.gentoo.org mailing list


All times are GMT. The time now is 08:08 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.