Additional Notes
****************

Here are miscellaneous notes about topics that might not be covered in
enough detail in the usage section.


"--chunker-params"
==================

The chunker params influence how input files are cut into pieces
(chunks) which are then considered for deduplication. They also have a
big impact on resource usage (RAM and disk space) as the amount of
resources needed is (also) determined by the total amount of chunks in
the repository (see Indexes / Caches memory usage for details).

"--chunker-params=buzhash,10,23,16,4095" results in a fine-grained
deduplication and creates a big amount of chunks and thus uses a lot
of resources to manage them. This is good for relatively small data
volumes and if the machine has a good amount of free RAM and disk
space.

"--chunker-params=buzhash,19,23,21,4095" (default) results in a
coarse-grained deduplication and creates a much smaller amount of
chunks and thus uses less resources. This is good for relatively big
data volumes and if the machine has a relatively low amount of free
RAM and disk space.

"--chunker-params=fixed,4194304" results in fixed 4 MiB sized block
deduplication and is more efficient than the previous example when
used for block devices (like disks, partitions, LVM LVs) or raw disk
image files.

"--chunker-params=fixed,4096,512" results in fixed 4 KiB sized blocks,
but the first header block will only be 512 B long. This might be
useful to deduplicate files with 1 header + N fixed-size data blocks.
Be careful not to produce too many chunks (such as using a small block
size for huge files).

If you already have made some archives in a repository and you then
change chunker params, this of course impacts deduplication as the
chunks will be cut differently.

In the worst case (all files are big and were touched in between
backups), this will store all content into the repository again.

Usually, it is not that bad though:

* usually most files are not touched, so it will just re-use the old
  chunks it already has in the repo

* files smaller than the (both old and new) minimum chunksize result
  in only one chunk anyway, so the resulting chunks are the same and
  deduplication will apply

If you switch chunker params to save resources for an existing repo
that already has some backup archives, you will see an increasing
effect over time, when more and more files have been touched and
stored again using the bigger chunksize **and** all references to the
smaller older chunks have been removed (by deleting / pruning
archives).

If you want to see an immediate big effect on resource usage, you
better start a new repository when changing chunker params.

For more details, see Chunks.


"--noatime / --noctime"
=======================

You can use these "borg create" options to not store the respective
timestamp into the archive, in case you do not really need it.

Besides saving a little space for the not archived timestamp, it might
also affect metadata stream deduplication: if only this timestamp
changes between backups and is stored into the metadata stream, the
metadata stream chunks won't deduplicate just because of that.


"--nobsdflags / --noflags"
==========================

You can use this to not query and store (or not extract and set) flags
- in case you don't need them or if they are broken somehow for your
fs.

On Linux, dealing with the flags needs some additional syscalls.
Especially when dealing with lots of small files, this causes a
noticeable overhead, so you can use this option also for speeding up
operations.


"--umask"
=========

borg uses a safe default umask of 077 (that means the files borg
creates have only permissions for the owner, but no permissions for
group and others) - so there should rarely be a need to change the
default behaviour.

This option only affects the process to which it is given. Thus, when
you run borg in client/server mode and you want to change the
behaviour on the server side, you need to use "borg serve --umask=XXX
..." as an SSH forced command in "authorized_keys". The "--umask"
value given on the client side is **not** transferred to the server
side.

Also, if you choose to use the "--umask" option, always be consistent
and use the same umask value so you do not create a mix-up of
permissions in a borg repository or with other files borg creates.


"--read-special"
================

The "--read-special" option is special - you do not want to use it for
normal full-filesystem backups, but rather after carefully picking
some targets for it.

The option "--read-special" triggers special treatment for block and
char device files as well as FIFOs. Instead of storing them as such a
device (or FIFO), they will get opened, their content will be read and
in the backup archive they will show up like a regular file.

Symlinks will also get special treatment if (and only if) they point
to such a special file: instead of storing them as a symlink, the
target special file will get processed as described above.

One intended use case of this is backing up the contents of one or
multiple block devices, like e.g. LVM snapshots or inactive LVs or
disk partitions.

You need to be careful about what you include when using "--read-
special", e.g. if you include "/dev/zero", your backup will never
terminate.

Restoring such files' content is currently only supported one at a
time via "--stdout" option (and you have to redirect stdout to
wherever it shall go, maybe directly into an existing device file of
your choice or indirectly via "dd").

To some extent, mounting a backup archive with the backups of special
files via "borg mount" and then loop-mounting the image files from
inside the mount point will work. If you plan to access a lot of data
in there, it will likely scale and perform better if you do not work
via the FUSE mount.


Example
-------

Imagine you have made some snapshots of logical volumes (LVs) you want
to backup.

Note:

  For some scenarios, this is a good method to get "crash-like"
  consistency (I call it crash-like because it is the same as you
  would get if you just hit the reset button or your machine would
  abruptly and completely crash). This is better than no consistency
  at all and a good method for some use cases, but likely not good
  enough if you have databases running.

Then you create a backup archive of all these snapshots. The backup
process will see a "frozen" state of the logical volumes, while the
processes working in the original volumes continue changing the data
stored there.

You also add the output of "lvdisplay" to your backup, so you can see
the LV sizes in case you ever need to recreate and restore them.

After the backup has completed, you remove the snapshots again.

   $ # create snapshots here
   $ lvdisplay > lvdisplay.txt
   $ borg create --read-special /path/to/repo::arch lvdisplay.txt /dev/vg0/*-snapshot
   $ # remove snapshots here

Now, let's see how to restore some LVs from such a backup.

   $ borg extract /path/to/repo::arch lvdisplay.txt
   $ # create empty LVs with correct sizes here (look into lvdisplay.txt).
   $ # we assume that you created an empty root and home LV and overwrite it now:
   $ borg extract --stdout /path/to/repo::arch dev/vg0/root-snapshot > /dev/vg0/root
   $ borg extract --stdout /path/to/repo::arch dev/vg0/home-snapshot > /dev/vg0/home


Separate compaction
===================

Borg does not auto-compact the segment files in the repository at
commit time (at the end of each repository-writing command) any more.

This is new since borg 1.2.0 and requires borg >= 1.2.0 on client and
server.

This causes a similar behaviour of the repository as if it was in
append-only mode (see below) most of the time (until "borg compact" is
invoked or an old client triggers auto-compaction).

This has some notable consequences:

* repository space is not freed immediately when deleting / pruning
  archives

* commands finish quicker

* repository is more robust and might be easier to recover after
  damages (as it contains data in a more sequential manner, historic
  manifests, multiple commits - until you run "borg compact")

* user can choose when to run compaction (it should be done regularly,
  but not necessarily after each single borg command)

* user can choose from where to invoke "borg compact" to do the
  compaction (from client or from server, it does not need a key)

* less repo sync data traffic in case you create a copy of your
  repository by using a sync tool (like rsync, rclone, ...)

You can manually run compaction by invoking the "borg compact"
command.


Append-only mode (forbid compaction)
====================================

A repository can be made "append-only", which means that Borg will
never overwrite or delete committed data (append-only refers to the
segment files, but borg will also reject to delete the repository
completely).

If "borg compact" command is used on a repo in append-only mode, there
will be no warning or error, but no compaction will happen.

Append-only is useful for scenarios where a backup client machine
backs up remotely to a backup server using "borg serve", since a
hacked client machine cannot delete backups on the server permanently.

To activate append-only mode, set "append_only" to 1 in the repository
config:

   borg config /path/to/repo append_only 1

Note that you can go back-and-forth between normal and append-only
operation with "borg config"; it's not a "one way trip."

In append-only mode Borg will create a transaction log in the
"transactions" file, where each line is a transaction and a UTC
timestamp.

In addition, "borg serve" can act as if a repository is in append-only
mode with its option "--append-only". This can be very useful for
fine-tuning access control in ".ssh/authorized_keys":

   command="borg serve --append-only ..." ssh-rsa <key used for not-always-trustworthy backup clients>
   command="borg serve ..." ssh-rsa <key used for backup management>

Running "borg init" via a "borg serve --append-only" server will *not*
create an append-only repository. Running "borg init --append-only"
creates an append-only repository regardless of server settings.


Example
-------

Suppose an attacker remotely deleted all backups, but your repository
was in append-only mode. A transaction log in this situation might
look like this:

   transaction 1, UTC time 2016-03-31T15:53:27.383532
   transaction 5, UTC time 2016-03-31T15:53:52.588922
   transaction 11, UTC time 2016-03-31T15:54:23.887256
   transaction 12, UTC time 2016-03-31T15:55:54.022540
   transaction 13, UTC time 2016-03-31T15:55:55.472564

From your security logs you conclude the attacker gained access at
15:54:00 and all the backups were deleted or replaced by compromised
backups. From the log you know that transactions 11 and later are
compromised. Note that the transaction ID is the name of the *last*
file in the transaction. For example, transaction 11 spans files 6 to
11.

In a real attack you'll likely want to keep the compromised repository
intact to analyze what the attacker tried to achieve. It's also a good
idea to make this copy just in case something goes wrong during the
recovery. Since recovery is done by deleting some files, a hard link
copy ("cp -al") is sufficient.

The first step to reset the repository to transaction 5, the last
uncompromised transaction, is to remove the "hints.N", "index.N" and
"integrity.N" files in the repository (these files are always
expendable). In this example N is 13.

Then remove or move all segment files from the segment directories in
"data/" starting with file 6:

   rm data/**/{6..13}

That's all to do in the repository.

If you want to access this rolled back repository from a client that
already has a cache for this repository, the cache will reflect a
newer repository state than what you actually have in the repository
now, after the rollback.

Thus, you need to clear the cache:

   borg delete --cache-only repo

The cache will get rebuilt automatically. Depending on repo size and
archive count, it may take a while.

You also will need to remove ~/.config/borg/security/REPOID/manifest-
timestamp.


Drawbacks
---------

As data is only appended, and nothing removed, commands like "prune"
or "delete" won't free disk space, they merely tag data as deleted in
a new transaction.

Be aware that as soon as you write to the repo in non-append-only mode
(e.g. prune, delete or create archives from an admin machine), it will
remove the deleted objects permanently (including the ones that were
already marked as deleted, but not removed, in append-only mode).
Automated edits to the repository (such as a cron job running "borg
prune") will render append-only mode moot if data is deleted.

Even if an archive appears to be available, it is possible an attacker
could delete just a few chunks from an archive and silently corrupt
its data. While in append-only mode, this is reversible, but "borg
check" should be run before a writing/pruning operation on an append-
only repository to catch accidental or malicious corruption:

   # run without append-only mode
   borg check --verify-data repo && borg compact repo

Aside from checking repository & archive integrity you may want to
also manually check backups to ensure their content seems correct.


Further considerations
----------------------

Append-only mode is not respected by tools other than Borg. "rm" still
works on the repository. Make sure that backup client machines only
get to access the repository via "borg serve".

Ensure that no remote access is possible if the repository is
temporarily set to normal mode (for example, for regular pruning).

Further protections can be implemented, but are outside of Borg's
scope. For example, file system snapshots or wrapping "borg serve" to
set special permissions or ACLs on new data files.


SSH batch mode
==============

When running Borg using an automated script, "ssh" might still ask for
a password, even if there is an SSH key for the target server. Use
this to make scripts more robust:

   export BORG_RSH='ssh -oBatchMode=yes'
