Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Fedora User (http://www.linux-archive.org/fedora-user/)
-   -   Understanding how dd works (http://www.linux-archive.org/fedora-user/113222-understanding-how-dd-works.html)

"Dan Track" 06-25-2008 11:27 AM

Understanding how dd works
 
Hi

I've got a xen vm file called test, if I copy it with dd I get the following
dd if=/opt/xen/test of=/opt/test-vm.img bs=4096
du -s /opt/xen/test = 1934112
du -s /opt/test-vm.img = 26240040

My question is why is the test-vm.img larger in size than the original?

Thanks
Dan

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list

Chris G 06-25-2008 11:48 AM

Understanding how dd works
 
On Wed, Jun 25, 2008 at 12:27:04PM +0100, Dan Track wrote:
> Hi
>
> I've got a xen vm file called test, if I copy it with dd I get the following
> dd if=/opt/xen/test of=/opt/test-vm.img bs=4096
> du -s /opt/xen/test = 1934112
> du -s /opt/test-vm.img = 26240040
>
> My question is why is the test-vm.img larger in size than the original?
>
Perhaps because the original file is 'sparse', i.e. it has large
unused chunks in it, when originally created these will be unallocated
and use no space, only when written to will the space be allocated.
However when you dd the file it writes everything (including 'nul'
data) to the destination file.

--
Chris Green

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list

"Dan Track" 06-25-2008 12:31 PM

Understanding how dd works
 
Thanks for the heads up on this. If the data blocks don't have
anything written into them, then what data is written into them when
using dd? if I restore the dd image will the blocks then be in the
same state i.e unwritten to?

Also following on from this if I create a file using dd let's say 2GB,
how does the filesystem know that all these blocks belong to the file
myfile.img, and where is the information stored to say that a block
has data written into it or not?

Thanks
Dan



On Wed, Jun 25, 2008 at 12:48 PM, Chris G <cl@isbd.net> wrote:
> On Wed, Jun 25, 2008 at 12:27:04PM +0100, Dan Track wrote:
>> Hi
>>
>> I've got a xen vm file called test, if I copy it with dd I get the following
>> dd if=/opt/xen/test of=/opt/test-vm.img bs=4096
>> du -s /opt/xen/test = 1934112
>> du -s /opt/test-vm.img = 26240040
>>
>> My question is why is the test-vm.img larger in size than the original?
>>
> Perhaps because the original file is 'sparse', i.e. it has large
> unused chunks in it, when originally created these will be unallocated
> and use no space, only when written to will the space be allocated.
> However when you dd the file it writes everything (including 'nul'
> data) to the destination file.
>
> --
> Chris Green
>
> --
> fedora-list mailing list
> fedora-list@redhat.com
> To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
>

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list

Chris G 06-25-2008 12:40 PM

Understanding how dd works
 
On Wed, Jun 25, 2008 at 01:31:55PM +0100, Dan Track wrote:
> On Wed, Jun 25, 2008 at 12:48 PM, Chris G <cl@isbd.net> wrote:
> > On Wed, Jun 25, 2008 at 12:27:04PM +0100, Dan Track wrote:
> >> Hi
> >>
> >> I've got a xen vm file called test, if I copy it with dd I get the following
> >> dd if=/opt/xen/test of=/opt/test-vm.img bs=4096
> >> du -s /opt/xen/test = 1934112
> >> du -s /opt/test-vm.img = 26240040
> >>
> >> My question is why is the test-vm.img larger in size than the original?
> >>
> > Perhaps because the original file is 'sparse', i.e. it has large
> > unused chunks in it, when originally created these will be unallocated
> > and use no space, only when written to will the space be allocated.
> > However when you dd the file it writes everything (including 'nul'
> > data) to the destination file.
> >
> Thanks for the heads up on this. If the data blocks don't have
> anything written into them, then what data is written into them when
> using dd?

dd will write what is returned by the operating system when you read
unused sections of a file, probably zeroes.

> if I restore the dd image will the blocks then be in the
> same state i.e unwritten to?

No, you'll get a 'fully populated' file with the previously empty bits
full of (probably) zeroes. It'll work fine though, or should do.

>
> Also following on from this if I create a file using dd let's say 2GB,
> how does the filesystem know that all these blocks belong to the file
> myfile.img, and where is the information stored to say that a block
> has data written into it or not?

It knows because dd explicity writes to every byte of the file, the
information is in the filesystem directory structure (inodes, etc.).

You get a file with unallocated 'holes' by opening a file, writing a
bit of data at the start (maybe) and then doing a large seek forward
in the file and writing some more data. The section you seek over (if
it's large enough, presumably larger than the default allocation block
size) will be empty and unwritten and will occupy no space on disk.

--
Chris Green

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list

"Patrick O'Callaghan" 06-25-2008 01:19 PM

Understanding how dd works
 
On Wed, 2008-06-25 at 13:31 +0100, Dan Track wrote:
> Thanks for the heads up on this. If the data blocks don't have
> anything written into them, then what data is written into them when
> using dd? if I restore the dd image will the blocks then be in the
> same state i.e unwritten to?
>
> Also following on from this if I create a file using dd let's say 2GB,
> how does the filesystem know that all these blocks belong to the file
> myfile.img, and where is the information stored to say that a block
> has data written into it or not?

It's important to understand that this has nothing to do with 'dd', it's
simply how the Unix filesystem works, and since Linux is "culturally
derived" from Unix, it does the same thing. You would see the same
effect just by using 'cp' or even 'cat'.

The basic points are these (I'm skating over a lot for clarity):

1) The system maintains a list of every physical disk block assigned to
the file (thus one of the things the 'fsck' command checks is that every
block in the filesystem is either assigned to a file or is on the free
list).

2) When a process writes to a file it need not do so sequentially
because the lseek(2) operation allows it to move it's "current position"
in the file. Furthermore, it's permissible to move the pointer beyond
the current end of the file. If a process does this by a large enough
amount and then writes data, the intervening space may have no disk
blocks assigned to it (depending on the distance moved and block
alignment). This is called a 'hole'. Files with holes in them are called
'sparse'.

3) The system keeps a separate count of the logical size of the file.
Because of the holes the logical size may be different from the physical
size. "ls -l" shows the logical size. "du" shows the real physical size
and may be different.

4) When a process tries to read from a hole, the system simply returns
nulls for the corresponding bytes. However if a process writes nulls
into a file, the system does *not* make any effort to detect them as a
special case, so they are simply written as any other data and the
system will allocate blocks to them. This happens when 'dd' (or 'cp' or
'cat') copies a file, so the resulting file can be larger than the
original.

Note that 'rsync --sparse' will preserve holes when it can.

Note also that if you're not careful you can backup a file or even a
filesystem that you can't restore because it's too big, especially if
copying it to some medium (e.g. a tape drive or non-UNIX disk system)
that can't handle sparse files.

Hope this helps.

poc

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list

"Dan Track" 06-25-2008 01:49 PM

Understanding how dd works
 
On Wed, Jun 25, 2008 at 2:19 PM, Patrick O'Callaghan
<pocallaghan@gmail.com> wrote:
> On Wed, 2008-06-25 at 13:31 +0100, Dan Track wrote:
>> Thanks for the heads up on this. If the data blocks don't have
>> anything written into them, then what data is written into them when
>> using dd? if I restore the dd image will the blocks then be in the
>> same state i.e unwritten to?
>>
>> Also following on from this if I create a file using dd let's say 2GB,
>> how does the filesystem know that all these blocks belong to the file
>> myfile.img, and where is the information stored to say that a block
>> has data written into it or not?
>
> It's important to understand that this has nothing to do with 'dd', it's
> simply how the Unix filesystem works, and since Linux is "culturally
> derived" from Unix, it does the same thing. You would see the same
> effect just by using 'cp' or even 'cat'.
>
> The basic points are these (I'm skating over a lot for clarity):
>
> 1) The system maintains a list of every physical disk block assigned to
> the file (thus one of the things the 'fsck' command checks is that every
> block in the filesystem is either assigned to a file or is on the free
> list).
>
> 2) When a process writes to a file it need not do so sequentially
> because the lseek(2) operation allows it to move it's "current position"
> in the file. Furthermore, it's permissible to move the pointer beyond
> the current end of the file. If a process does this by a large enough
> amount and then writes data, the intervening space may have no disk
> blocks assigned to it (depending on the distance moved and block
> alignment). This is called a 'hole'. Files with holes in them are called
> 'sparse'.
>
> 3) The system keeps a separate count of the logical size of the file.
> Because of the holes the logical size may be different from the physical
> size. "ls -l" shows the logical size. "du" shows the real physical size
> and may be different.
>
> 4) When a process tries to read from a hole, the system simply returns
> nulls for the corresponding bytes. However if a process writes nulls
> into a file, the system does *not* make any effort to detect them as a
> special case, so they are simply written as any other data and the
> system will allocate blocks to them. This happens when 'dd' (or 'cp' or
> 'cat') copies a file, so the resulting file can be larger than the
> original.
>
> Note that 'rsync --sparse' will preserve holes when it can.
>
> Note also that if you're not careful you can backup a file or even a
> filesystem that you can't restore because it's too big, especially if
> copying it to some medium (e.g. a tape drive or non-UNIX disk system)
> that can't handle sparse files.
>
> Hope this helps.
>
> poc


Hi Patrick,

Really appreciate the detailed explanation. It's a real eye opener.
Can you point me to any docs that I could read around this subject?

Thanks
Dan

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list

"Patrick O'Callaghan" 06-25-2008 04:00 PM

Understanding how dd works
 
On Wed, 2008-06-25 at 14:49 +0100, Dan Track wrote:
> Really appreciate the detailed explanation. It's a real eye opener.
> Can you point me to any docs that I could read around this subject?

Any book on Unix internals or Unix programming. http://tldp.org/ or
http://www.linux-tutorial.info/index.php would be good places to start.
Also "man lseek".

poc

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list

"Mikkel L. Ellertson" 06-25-2008 04:02 PM

Understanding how dd works
 
Patrick O'Callaghan wrote:

On Wed, 2008-06-25 at 13:31 +0100, Dan Track wrote:

Thanks for the heads up on this. If the data blocks don't have
anything written into them, then what data is written into them when
using dd? if I restore the dd image will the blocks then be in the
same state i.e unwritten to?

Also following on from this if I create a file using dd let's say 2GB,
how does the filesystem know that all these blocks belong to the file
myfile.img, and where is the information stored to say that a block
has data written into it or not?


It's important to understand that this has nothing to do with 'dd', it's
simply how the Unix filesystem works, and since Linux is "culturally
derived" from Unix, it does the same thing. You would see the same
effect just by using 'cp' or even 'cat'.


cp knows how to handle sparse files. From the cp man page:

By default, sparse SOURCE files are detected by a crude heuristic
and the corresponding DEST file is made sparse as well. That is the
behavior selected by --sparse=auto. Specify --sparse=always to
create a sparse DEST file whenever the SOURCE file contains a long
enough sequence of zero bytes. Use --sparse=never to inhibit
creation of sparse files.


So I would think that cp would give him a good copy...

Mikkel
--

Do not meddle in the affairs of dragons,
for thou art crunchy and taste good with Ketchup!

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list

"Patrick O'Callaghan" 06-25-2008 04:29 PM

Understanding how dd works
 
On Wed, 2008-06-25 at 11:02 -0500, Mikkel L. Ellertson wrote:
> Patrick O'Callaghan wrote:
> > On Wed, 2008-06-25 at 13:31 +0100, Dan Track wrote:
> >> Thanks for the heads up on this. If the data blocks don't have
> >> anything written into them, then what data is written into them when
> >> using dd? if I restore the dd image will the blocks then be in the
> >> same state i.e unwritten to?
> >>
> >> Also following on from this if I create a file using dd let's say 2GB,
> >> how does the filesystem know that all these blocks belong to the file
> >> myfile.img, and where is the information stored to say that a block
> >> has data written into it or not?
> >
> > It's important to understand that this has nothing to do with 'dd', it's
> > simply how the Unix filesystem works, and since Linux is "culturally
> > derived" from Unix, it does the same thing. You would see the same
> > effect just by using 'cp' or even 'cat'.
> >
> cp knows how to handle sparse files. From the cp man page:
>
> By default, sparse SOURCE files are detected by a crude heuristic
> and the corresponding DEST file is made sparse as well. That is the
> behavior selected by --sparse=auto. Specify --sparse=always to
> create a sparse DEST file whenever the SOURCE file contains a long
> enough sequence of zero bytes. Use --sparse=never to inhibit
> creation of sparse files.
>
> So I would think that cp would give him a good copy...

True. I've been using cp for more than 30 years so I hadn't looked at
the man page in the last decade or two :-)

poc

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list

John Thompson 06-25-2008 09:16 PM

Understanding how dd works
 
On 2008-06-25, Dan Track <dan.track@gmail.com> wrote:

> Really appreciate the detailed explanation. It's a real eye opener.
> Can you point me to any docs that I could read around this subject?

http://en.wikipedia.org/wiki/Sparse_file

--

John (john@os2.dhs.org)

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list


All times are GMT. The time now is 12:37 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.