dd – how to extract a subsection of a file with 2 offsets?

Posted on

Problem :

I have a file that I want where the starting byte offset is 3020852 and the ending byte offset is 13973824.

There’s some variation of this command: dd ibs=X obs=Y skip=1 count=1 that I haven’t got working yet.

Solution :

There are several ways to do this, as you can read in this similar question. I’ll give you the (in my opinion most “idiomatic”) head | tail approach and the dd approach.

head --bytes=<end_offset> in_file.bin | tail --bytes=<end_offset - start_offset> > out_file.bin

Alternatively:

dd bs=1 skip=<start_offset> count=<end_offset - start_offset> < in_file > out_file.bin

With help from @agtoever and @tom-yan this is the fastest way to achieve this:

dd if=somefile of=somefile2 skip=$start_offset count=$(($end_offset-$start_offset)) iflag=skip_bytes,count_bytes

I left the bs unspecified, but it can be set to anything. A 1MiB bs is a good rule of thumb.

Thanks.

Where existing tools fail, write your own:

#!/usr/bin/env python
start, end = 3020852, 13973824
with open("input.bin", "rb") as inf:
    with open("output.bin", "wb") as outf:
        inf.seek(start)
        data = inf.read(end-start)
        outf.write(data)
        # just in case
        assert(inf.tell() == end)

The total size isn’t large so it just reads the whole block into RAM at once. If you wanted to copy several GB block-by-block, you could do it this way:

#!/usr/bin/env python
start = 3020852
end = 13973824
size = end - start
bs = 32 << 20   # (32 MB)
with open("input.bin", "rb") as inf:
    with open("output.bin", "wb") as outf:
        inf.seek(start)
        while size > 0:
            data = inf.read(min(size, bs))
            outf.write(data)
            size -= len(data)
        assert(inf.tell() == end)

Leave a Reply

Your email address will not be published. Required fields are marked *