1 | # s3-bsync |
2 |
|
3 | Bidirectional syncing tool to sync local filesystem directories with S3 |
4 | buckets. Written by [Josh Stockin](https://joshstock.in). |
5 |
|
6 | **Work in progress. Not in a functional state. Do NOT use this.** |
7 |
|
8 | ### Behavior |
9 |
|
10 | After an initial sync (manually handling conflicts and uncommon files), the S3 |
11 | bucket maintains precedence. Files with the same size and modify time on both |
12 | hosts are ignored. A newer copy of a file always overwrites the corresponding |
13 | old, regardless of changes in the old. (In other words, **there is no manual |
14 | conflict resolution after first sync. Conflicting files are handled |
15 | automatically as described here.** This script is meant to run without input |
16 | or output by default, in a cron job for example.) Untracked files, in either |
17 | S3 or on the local machine, are copied to the opposite host and tracked. |
18 | Tracked files that are moved or removed on either host are moved or removed on |
19 | the corresponding host, with the tracking adjusted accordingly. Ultimately, |
20 | after a sync, the `.state.s3sync` state tracking file should match the contents |
21 | of the S3 bucket's synced directories. |
22 |
|
23 | ### Installation |
24 |
|
25 | Depends on `python3` and `aws-cli`. Both can be installed with your package |
26 | manager. Requires Python modules `pip` and `setuptools` if you want to install |
27 | on your system path using one of the methods listed below. |
28 |
|
29 | Install with one of the following: |
30 |
|
31 | * `./install.sh [interpreter?]` (Preferred) |
32 | * `python3 -m pip install .` |
33 | * `./setup.py` (Not recommended) |
34 |
|
35 | Uninstall with one of the following: |
36 |
|
37 | * `./install.sh uninstall [interpreter?]` (Preferred) |
38 | * `python3 -m pip uninstall s3-bsync` |
39 |
|
40 | `install.sh` is a frontend for `pip (un)install`, configured by setuptools in |
41 | `setup.py`. |
42 |
|
43 | Root permissions are not required. *This program does not manage S3 |
44 | authentication or `aws-cli` credentials. You must do this yourself with the |
45 | `aws configure` command, or through some other means of IAM/S3 policy.* |
46 |
|
47 | ### Usage |
48 |
|
49 | ``` |
50 | usage: s3-bsync [-h] [-v] [-i] [--debug] [--file SYNCFILE] [--dump] [--dryrun] [--purge] |
51 | [--overwrite] [--dir PATH S3_DEST] |
52 | |
53 | Bidirectional syncing tool to sync local filesystem directories with S3 buckets. |
54 | |
55 | optional arguments: |
56 | -h, -?, --help Display this help message and exit. |
57 | -v, --version Display program and version information and exit. |
58 | |
59 | program behavior: |
60 | The program runs in sync mode by default. |
61 | |
62 | -i, --init Run in initialize mode. This allows tracking file management and |
63 | directory options to be used. (default: False) |
64 | --debug Enables debug mode, which prints program information to stdout. |
65 | (default: False) |
66 | --file SYNCFILE The s3sync state file used to store tracking and state |
67 | information. It should resolve to an absolute path. (default: |
68 | ['~/.state.s3sync']) |
69 | --dump Dump s3sync state file configuration. --dryrun implicitly enabled. |
70 | (default: False) |
71 | --dryrun Run program logic without making changes. Useful when paired with |
72 | debug mode to see what changes would be made. (default: False) |
73 | |
74 | tracking file management: |
75 | Requires initialize mode to be enabled. |
76 | |
77 | --purge Deletes the default (if not otherwise specified with --file) |
78 | tracking configuration file if it exists. (default: False) |
79 | --overwrite Overwrite tracking file with new directory maps instead of |
80 | appending. (default: False) |
81 | |
82 | directory mapping: |
83 | Requires initialize mode to be enabled. |
84 | |
85 | --dir PATH S3_DEST Directory map to detail which local directory corresponds to S3 |
86 | bucket and key prefix. Can be used multiple times to set multiple |
87 | directories. Local directories must be absolute. S3 destination in |
88 | `s3://bucket-name/prefix` format. Example: `--dir |
89 | /home/josh/Documents s3://joshstockin/Documents` |
90 | ``` |
91 |
|
92 | #### Source files |
93 |
|
94 | `setup.py` manages installation metadata. |
95 | `install.sh` handles installation and uninstallation using pip. |
96 |
|
97 | #### Created files and .s3syncignore |
98 |
|
99 | The default file used to store sync information is `~/.state.s3sync`, but this |
100 | location can be reconfigured. The file uses the binary s3sync file format |
101 | specified later in this document. If you want to intentionally ignore |
102 | untracked files, use a `.s3syncignore` file, in the same manner as |
103 | [`.gitignore`](https://git-scm.com/docs/gitignore). |
104 |
|
105 | ## s3sync file format |
106 |
|
107 | The `.state.s3sync` file saved in home directory defines the state of tracked |
108 | objects from the specified S3 buckets and key prefixes used in the last sync. |
109 |
|
110 | ### Control bytes |
111 |
|
112 | 90 - Begin bucket block |
113 | 91 - End bucket block |
114 | 92 - Begin directory map |
115 | 93 - End directory map |
116 | 94 - Begin object block |
117 | 95 - End object block |
118 | 96 - ETag type MD5 |
119 | 97 - ETag type null-terminated string (non-MD5) |
120 | 98 |
121 | 99 |
122 | 9A - Begin metadata block |
123 | 9B - End metadata block |
124 | 9C |
125 | 9D - File signature byte |
126 | 9E |
127 | 9F - File signature byte |
128 |
|
129 | ### File structure |
130 |
|
131 | Version 1 of the s3sync file format. |
132 |
|
133 | ``` |
134 | Header { |
135 | File signature - 4 bytes - 9D 9F 53 33 |
136 | File version - 1 byte - 01 |
137 | } |
138 | Metadata block { |
139 | Begin metadata block control byte - 9A |
140 | Last synced time - 8 bytes uint |
141 | End metadata block control byte - 9B |
142 | } |
143 | Bucket block { |
144 | Begin bucket block control byte - 90 |
145 | Bucket name - null-terminated string |
146 | Directory map { |
147 | Begin directory map block control byte - 92 |
148 | Path to local directory - null-terminated string |
149 | S3 key prefix (no `/` termination) - null-terminated string |
150 | Compress (gzip level) - 0-11 (4 bytes) |
151 | Recursive sync - 1 byte boolean |
152 | GPG encryption enabled - 1 byte boolean |
153 | GPG encryption email - null-terminated string |
154 | End directory map block control byte - 93 |
155 | }... |
156 | Recorded object { |
157 | Begin object block control byte - 94 |
158 | Key - null-terminated string |
159 | Last modified time - 8 bytes uint |
160 | ETag type - 96 or 97 |
161 | ETag - 16 bytes or null-terminated string |
162 | File size - 8 bytes uint |
163 | End object block control byte - 95 |
164 | }... |
165 | End bucket block control byte - 91 |
166 | }... |
167 | ``` |
168 |
|
169 | ## Copyright |
170 |
|
171 | This program is copyrighted by [Joshua Stockin](https://joshstock.in/) and |
172 | licensed under the [MIT License](LICENSE). |
173 |
|
174 | A form of the following should be present in each source file. |
175 |
|
176 | ```txt |
177 | s3-bsync Copyright (c) 2022 Joshua Stockin |
178 | <https://joshstock.in> |
179 | <https://git.joshstock.in/s3-bsync> |
180 | |
181 | This software is licensed and distributed under the terms of the MIT License. |
182 | See the MIT License in the LICENSE file of this project's root folder. |
183 | |
184 | This comment block and its contents, including this disclaimer, MUST be |
185 | preserved in all copies or distributions of this software's source. |
186 | ``` |
187 |
|
188 | <<https://joshstock.in>> | [josh@joshstock.in](mailto:josh@joshstock.in) | joshuas3#9641 |
189 |
|