1 | # s3-bsync |
2 |
|
3 | Bidirectional syncing tool to sync local filesystem directories with S3 |
4 | buckets. Written by [Josh Stockin](https://joshstock.in). |
5 |
|
6 | **Work in progress. Not in a functional state. Do NOT use this.** |
7 |
|
8 | ### Behavior |
9 |
|
10 | After an initial sync (manually handling conflicts and uncommon files), the S3 |
11 | bucket maintains precedence. Files with the same size and modify time on both |
12 | hosts are ignored. A newer copy of a file always overwrites the corresponding |
13 | old, regardless of changes in the old. (In other words, **there is no manual |
14 | conflict resolution after first sync. Conflicting files are handled |
15 | automatically as described here.** This script is meant to run without input |
16 | or output by default, in a cron job for example.) Untracked files, in either |
17 | S3 or on the local machine, are copied to the opposite host and tracked. |
18 | Tracked files that are moved or removed on either host are moved or removed on |
19 | the corresponding host, with the tracking adjusted accordingly. Ultimately, |
20 | after a sync, the `.state.s3sync` state tracking file should match the contents |
21 | of the S3 bucket's synced directories. |
22 |
|
23 | ### Installation |
24 |
|
25 | Depends on `python3` and `aws-cli`. Both can be installed with your package |
26 | manager. Requires Python modules `pip` and `setuptools` if you want to install |
27 | on your system path using one of the methods listed below. |
28 |
|
29 | Install with one of the following: |
30 |
|
31 | * `./install.sh [interpreter?]` (Preferred) |
32 | * `python3 -m pip install .` |
33 | * `./setup.py` (Not recommended) |
34 |
|
35 | Uninstall with one of the following: |
36 |
|
37 | * `./install.sh uninstall [interpreter?]` (Preferred) |
38 | * `python3 -m pip uninstall s3-bsync` |
39 |
|
40 | `install.sh` is a frontend for `pip (un)install`, configured by setuptools in |
41 | `setup.py`. |
42 |
|
43 | Root permissions are not required. *This program does not manage S3 |
44 | authentication or `aws-cli` credentials. You must do this yourself with the |
45 | `aws configure` command, or through some other means of IAM/S3 policy.* |
46 |
|
47 | #### Source files |
48 |
|
49 | `setup.py` manages installation metadata. |
50 | `install.sh` handles installation and uninstallation using pip. |
51 |
|
52 | #### Created files and .s3syncignore |
53 |
|
54 | The default file used to store sync information is `~/.state.s3sync`, but this |
55 | location can be reconfigured. The file uses the binary s3sync file format |
56 | specified later in this document. If you want to intentionally ignore |
57 | untracked files, use a `.s3syncignore` file, in the same manner as |
58 | [`.gitignore`](https://git-scm.com/docs/gitignore). |
59 |
|
60 | ## s3sync file format |
61 |
|
62 | The `.state.s3sync` file saved in home directory defines the state of tracked |
63 | objects from the specified S3 buckets and key prefixes used in the last sync. |
64 |
|
65 | ### Control bytes |
66 |
|
67 | 90 - Begin bucket block |
68 | 91 - End bucket block |
69 | 92 - Begin directory map |
70 | 93 - End directory map |
71 | 94 - Begin object block |
72 | 95 - End object block |
73 | 96 - ETag type MD5 |
74 | 97 - ETag type null-terminated string (non-MD5) |
75 | 98 |
76 | 99 |
77 | 9A - Begin metadata block |
78 | 9B - End metadata block |
79 | 9C |
80 | 9D - File signature byte |
81 | 9E |
82 | 9F - File signature byte |
83 |
|
84 | ### File structure |
85 |
|
86 | Version 1 of the s3sync file format. |
87 |
|
88 | ``` |
89 | Header { |
90 | File signature - 4 bytes - 9D 9F 53 33 |
91 | File version - 1 byte - 01 |
92 | } |
93 | Metadata block { |
94 | Begin metadata block control byte - 9A |
95 | Last synced time - 8 bytes uint |
96 | End metadata block control byte - 9B |
97 | } |
98 | Bucket block { |
99 | Begin bucket block control byte - 90 |
100 | Bucket name - null-terminated string |
101 | Directory map { |
102 | Begin directory map block control byte - 92 |
103 | Path to local directory - null-terminated string |
104 | S3 key prefix - null-terminated string |
105 | Recursive sync - 1 byte boolean |
106 | End directory map block control byte - 93 |
107 | }... |
108 | Recorded object { |
109 | Begin object block control byte - 94 |
110 | Key - null-terminated string |
111 | Last modified time - 8 bytes uint |
112 | ETag type - 96 or 97 |
113 | ETag - 16 bytes or null-terminated string |
114 | File size - 8 bytes uint |
115 | End object block control byte - 95 |
116 | }... |
117 | End bucket block control byte - 91 |
118 | }... |
119 | ``` |
120 |
|
121 | ## Copyright |
122 |
|
123 | This program is copyrighted by [Joshua Stockin](https://joshstock.in/) and |
124 | licensed under the [MIT License](LICENSE). |
125 |
|
126 | A form of the following should be present in each source file. |
127 |
|
128 | ```txt |
129 | s3-bsync Copyright (c) 2021 Joshua Stockin |
130 | <https://joshstock.in> |
131 | <https://git.joshstock.in/s3-bsync> |
132 | |
133 | This software is licensed and distributed under the terms of the MIT License. |
134 | See the MIT License in the LICENSE file of this project's root folder. |
135 | |
136 | This comment block and its contents, including this disclaimer, MUST be |
137 | preserved in all copies or distributions of this software's source. |
138 | ``` |
139 |
|
140 | <<https://joshstock.in>> | [josh@joshstock.in](mailto:josh@joshstock.in) | joshuas3#9641 |
141 |
|