Rsync Internet Backup FAQ
Is there a maximum size or number of files in my data set?
In theory, there's no real limit to the number of files or directories that you can Rsync - apart from the practical limitation of RAM.
We have run tests on several different file systems - a typical file system of 70,000 files and 24 GB with under 50 MB of daily changes can be synced in around 10 minutes. The largest file system we've tested is of 200,000 files and 100 GB, which took 20 minutes to sync.
How does Rsync perform on files and directories?
Rsync performs best on the file system directly - backing up normal files and directories. It performs far better than if you were to use Rsync to synchronize a backup file offsite.
Let's look at example to see why that's the case.
Scenario 1: File system with 50,000 files, 50 GB total; 50 files of total size 50 MB have changed.
Rsync is able to identify which of the 50 files have changed, and for those files, it determines the in-file deltas. It calculates checksums on 50 MB of data, and the backup can complete in a matter of minutes. The amount of data transferred will be around 20 MB for typical documents.
Scenario 2: The file system is backed up via NTBackup, which results in a 50GB bkf file.
Rsync will detect that the single bkf file has changed, and needs to determine the in-file deltas. It needs to calculate checksums on 50 GB of data, which may take hours. Additionally, we have found that even if the underlying file system changes very little, about 10% of a bkf file changes from day to day and needs to be transferred. Thus, about 5 GB will be transferred.
We see here that it is highly preferable in terms of both bandwidth and CPU time that the underlying file system is selected for Rsync, not a backup of that file system.
Can I backup Exchange databases, SQL databases using Rsync?
Yes. We have performance benchmarked Rsyncing both SQL and Exchange database backups, and conclude that it is feasible to use Rsync to transport SQL and Exchange database backups offsite. Please see our slideshow presentation for more details.
Can I use Rsync to synchronize my drive images offsite?
Following from the discussions above, We recommend that you select the underlying file system for Rsync, not a backup of the file system.
However, having said that, drive images can be more suitable for Rsync than other types of backup, provided they are uncompressed and unencrypted. However, the checksum process will be CPU intensive. We have found on typical servers that checksums can be performed at a rate of about 100-120GB per hour, during which time the server's CPU is at about 30% on a single core. [Note: on multi-core processors, this means that CPU usage is not particularly high.]
So the time to backup via Rsync can be approximately calculated as:
2 * checksum time (one checksum for each end) + network time
So the short answer is: if you really really want to do it, you can, but we believe there are better ways to achieve what you want to achieve.
Remember - that the purpose of doing multiple backups is redundancy. That means protecting your data in different ways, to different locations. If you synchronize a drive image offsite, you run the risk that if that drive image is bad for whatever reason, you have just lost your data. However, if you back up your underlying file system using Rsync instead, if your image goes bad, you still have your files and folders at your remote site.
What devices are compatible with Rsync?
Rsync is an open-source program, with an open-specification protocol, so the number of Rsync compatible devices is always growing.
Any typical Linux or Windows machine can be configured to run Rsync. Additionally, several NAS devices are Rsync enabled - please see how Rsync White Paper for more details.
What Open Source software do you use?
BackupAssist uses several Open Source components, which are licensed under the GPL, BSD license or MIT license.
The following table summarises the open source software that we use: