| 1 | ## s3-bsync |
| 2 |
|
| 3 | Bidirectional syncing tool to sync local filesystem directories with S3 |
| 4 | buckets. Developed by [Josh Stockin](https://joshstock.in) and licensed under |
| 5 | the MIT License. |
| 6 |
|
| 7 | **Work in progress (v0.1.0). Not in a functional or usable state. Do NOT use |
| 8 | this unless you know what you are doing.** |
| 9 |
|
| 10 | ### Behavior |
| 11 |
|
| 12 | After an initial sync (manually handling conflicts and uncommon files), the S3 |
| 13 | bucket maintains precedence. Files with the same size and modify time on both |
| 14 | hosts are ignored. A newer copy of a file always overwrites the corresponding |
| 15 | old, regardless of changes in the old. (In other words, **there is no manual |
| 16 | conflict resolution after the first sync. Conflicting files are handled |
| 17 | automatically as described here.** This script is meant to run without input |
| 18 | or output by default, in a cron job for example.) Untracked files, in either |
| 19 | S3 or on the local machine, are copied to the opposite host and tracked. |
| 20 | Tracked files that are moved or removed on either host are moved or removed on |
| 21 | the corresponding host, with the tracking adjusted accordingly. Ultimately, |
| 22 | after a sync, the `.state.s3sync` state tracking file should match the contents |
| 23 | of the S3 bucket's and local synced directories. |
| 24 |
|
| 25 | ### Installation |
| 26 |
|
| 27 | Depends on `python3` and `aws-cli`. Both can be installed with your package |
| 28 | manager. Requires Python modules `pip` and `setuptools` if you want to install |
| 29 | on your system path using one of the methods listed below. Python module |
| 30 | `python-gnupg` optionally required if you wish to use GPG encryption options. |
| 31 |
|
| 32 | Install with one of the following: |
| 33 |
|
| 34 | * `./install.sh [<python interpreter>?]` (Preferred) |
| 35 | * `python3 -m pip install .` |
| 36 | * `python3 ./setup.py install` (Not recommended) |
| 37 |
|
| 38 | Uninstall with one of the following: |
| 39 |
|
| 40 | * `./install.sh uninstall [<python interpreter>?]` (Preferred) |
| 41 | * `python3 -m pip uninstall s3-bsync` |
| 42 |
|
| 43 | `install.sh` is a frontend for `pip (un)install`, configured by setuptools in |
| 44 | `setup.py`. The script automatically performs compatibility checks on Python |
| 45 | interpreter and other required dependencies. |
| 46 |
|
| 47 | Root permissions are not required. *This program does not manage S3 |
| 48 | authentication or `aws-cli` credentials. You must do this yourself with the |
| 49 | `aws configure` command, or through some other means of IAM/S3 policy.* |
| 50 |
|
| 51 | ### Usage |
| 52 |
|
| 53 | ``` |
| 54 | usage: s3-bsync [--help] [--version] [--init] [--debug] [--dryrun] [--file SYNCFILE] |
| 55 | [--dump] [--purge] [--overwrite] [--dir PATH S3_DEST] [--rmdir RMPATH] |
| 56 | |
| 57 | Bidirectional syncing tool to sync local filesystem directories with S3 buckets. |
| 58 | |
| 59 | optional arguments: |
| 60 | --help, -h, -? Display this help message and exit. |
| 61 | --version, -v Display program and version information and exit. |
| 62 | |
| 63 | program behavior: |
| 64 | The program runs in sync mode by default. |
| 65 | |
| 66 | --init, -i Run in initialize (edit) mode. This allows tracking file |
| 67 | management and directory options to be used. (default: False) |
| 68 | --debug Enables debug mode, which prints program information to stdout. |
| 69 | (default: False) |
| 70 | --dryrun Run program logic without making changes. Useful when paired with |
| 71 | debug mode to see what changes would be made. (default: False) |
| 72 | |
| 73 | tracking file management: |
| 74 | Configuring the tracking file. |
| 75 | |
| 76 | --file SYNCFILE The s3sync state file used to store tracking and state |
| 77 | information. It should resolve to an absolute path. (default: |
| 78 | ['~/.state.s3sync']) |
| 79 | --dump Dump s3sync state file configuration and exit. (default: False) |
| 80 | --purge Deletes the tracking configuration file if it exists and exits. |
| 81 | Requires init mode. (default: False) |
| 82 | --overwrite Overwrite tracking file with new directory maps instead of |
| 83 | appending. Requires init mode. (default: False) |
| 84 | |
| 85 | directory mapping: |
| 86 | Requires initialize mode to be enabled. |
| 87 | |
| 88 | --dir PATH S3_DEST Directory map to detail which local directory corresponds to S3 |
| 89 | bucket and key prefix. Can be used multiple times to set multiple |
| 90 | directories. Local directories must be absolute. S3 destination in |
| 91 | `s3://bucket-name/prefix` format. Example: `--dir |
| 92 | /home/josh/Documents s3://joshstockin/Documents` |
| 93 | --rmdir RMPATH Remove tracked directory map by local directory identifier. |
| 94 | Running `--rmdir /home/josh/Documents` would remove the directory |
| 95 | map from the s3syncfile and stop tracking/syncing that directory. |
| 96 | ``` |
| 97 |
|
| 98 | #### Source files |
| 99 |
|
| 100 | `setup.py` manages installation metadata. |
| 101 | `install.sh` handles installation and uninstallation using pip. |
| 102 |
|
| 103 | #### Created files and .s3syncignore |
| 104 |
|
| 105 | The default file used to store sync information is `~/.state.s3sync`, but this |
| 106 | location can be reconfigured. The file uses the binary s3sync file format |
| 107 | specified later in this document. If you want to intentionally ignore |
| 108 | untracked files, use a `.s3syncignore` file, in the same manner as |
| 109 | [`.gitignore`](https://git-scm.com/docs/gitignore). |
| 110 |
|
| 111 | ## s3sync file format |
| 112 |
|
| 113 | The `.state.s3sync` file saved in home directory defines the state of tracked |
| 114 | objects from the specified S3 buckets and key prefixes used in the last sync. |
| 115 |
|
| 116 | ### Control bytes |
| 117 |
|
| 118 | 90 - Begin bucket block |
| 119 | 91 - End bucket block |
| 120 | 92 - Begin directory map |
| 121 | 93 - End directory map |
| 122 | 94 - Begin object block |
| 123 | 95 - End object block |
| 124 | 96 - ETag type MD5 |
| 125 | 97 - ETag type null-terminated string (non-MD5) |
| 126 | 98 |
| 127 | 99 |
| 128 | 9A - Begin metadata block |
| 129 | 9B - End metadata block |
| 130 | 9C |
| 131 | 9D - File signature byte |
| 132 | 9E |
| 133 | 9F - File signature byte |
| 134 |
|
| 135 | ### File structure |
| 136 |
|
| 137 | Version 1 of the s3sync file format. |
| 138 |
|
| 139 | ``` |
| 140 | Header { |
| 141 | File signature - 4 bytes - 9D 9F 53 33 |
| 142 | File version - 1 byte - 01 |
| 143 | } |
| 144 | Metadata block { |
| 145 | Begin metadata block control byte - 9A |
| 146 | Last synced time - 8 bytes uint |
| 147 | End metadata block control byte - 9B |
| 148 | } |
| 149 | Bucket block { |
| 150 | Begin bucket block control byte - 90 |
| 151 | Bucket name - null-terminated string |
| 152 | Directory map { |
| 153 | Begin directory map block control byte - 92 |
| 154 | Path to local directory - null-terminated string |
| 155 | S3 key prefix (no `/` termination) - null-terminated string |
| 156 | Compress (gzip level) - 0-11 (1 byte) |
| 157 | Recursive sync - 1 byte boolean |
| 158 | GPG encryption enabled - 1 byte boolean |
| 159 | GPG encryption email - null-terminated string |
| 160 | End directory map block control byte - 93 |
| 161 | }... |
| 162 | Recorded object { |
| 163 | Begin object block control byte - 94 |
| 164 | Key - null-terminated string |
| 165 | Last modified time - 8 bytes uint |
| 166 | ETag type - 96 or 97 |
| 167 | ETag - 16 bytes or null-terminated string |
| 168 | File size - 8 bytes uint |
| 169 | End object block control byte - 95 |
| 170 | }... |
| 171 | End bucket block control byte - 91 |
| 172 | }... |
| 173 | ``` |
| 174 |
|
| 175 | ## Copyright |
| 176 |
|
| 177 | This program is copyrighted by [Joshua Stockin](https://joshstock.in/) and |
| 178 | licensed under the [MIT License](LICENSE). |
| 179 |
|
| 180 | A form of the following should be present in each source file. |
| 181 |
|
| 182 | ```txt |
| 183 | s3-bsync Copyright (c) 2022 Joshua Stockin |
| 184 | <https://joshstock.in> |
| 185 | <https://git.joshstock.in/s3-bsync> |
| 186 | |
| 187 | This software is licensed and distributed under the terms of the MIT License. |
| 188 | See the MIT License in the LICENSE file of this project's root folder. |
| 189 | |
| 190 | This comment block and its contents, including this disclaimer, MUST be |
| 191 | preserved in all copies or distributions of this software's source. |
| 192 | ``` |
| 193 |
|
| 194 | <<https://joshstock.in>> | [josh@joshstock.in](mailto:josh@joshstock.in) | joshuas3#9641 |
| 195 |
|