1 | # s3-bsync |
2 |
|
3 | Bidirectional syncing tool to sync local filesystem directories with S3 |
4 | buckets. Written by [Josh Stockin](https://joshstock.in). |
5 |
|
6 | **Work in progress. Not in a functional state. Do NOT use this.** |
7 |
|
8 | ### Behavior |
9 |
|
10 | After an initial sync (manually handling conflicts and uncommon files), the S3 |
11 | bucket maintains precedence. Files with the same size and modify time on both |
12 | hosts are ignored. A newer copy of a file always overwrites the corresponding |
13 | old, regardless of changes in the old. (In other words, **there is no manual |
14 | conflict resolution after first sync. Conflicting files are handled |
15 | automatically as described here.** This script is meant to run without input |
16 | or output by default, in a cron job for example.) Untracked files, in either |
17 | S3 or on the local machine, are copied to the opposite host and tracked. |
18 | Tracked files that are moved or removed on either host are moved or removed on |
19 | the corresponding host, with the tracking adjusted accordingly. Ultimately, |
20 | after a sync, the `.state.s3sync` state tracking file should match the contents |
21 | of the S3 bucket's synced directories. |
22 |
|
23 | ### Installation |
24 |
|
25 | Depends on `python3` and `aws-cli`. Both can be installed with your package |
26 | manager. |
27 |
|
28 | Install with `python3 setup.py install`. Root permissions not required. |
29 | *This program does not manage S3 authentication or `aws-cli` credentials. You |
30 | must do this yourself with the `aws configure` command, or other means of |
31 | IAM/S3 policy.* |
32 |
|
33 | #### Source files |
34 |
|
35 | `setup.py` manages installation metadata. |
36 |
|
37 | #### Created files and .s3syncignore |
38 |
|
39 | The default file used to store sync information is `~/.state.s3sync`, but this |
40 | location can be reconfigured. The file uses the binary s3sync file format |
41 | specified later in this document. If you want to intentionally ignore |
42 | untracked files, use a `.s3syncignore` file, in the same manner as |
43 | [`.gitignore`](https://git-scm.com/docs/gitignore). |
44 |
|
45 | ## s3sync file format |
46 |
|
47 | The `.state.s3sync` file saved in home directory defines the state of tracked |
48 | objects from the specified S3 buckets and key prefixes used in the last sync. |
49 |
|
50 | ### Control bytes |
51 |
|
52 | 90 - Begin bucket block |
53 | 91 - End bucket block |
54 | 92 - Begin directory map |
55 | 93 - End directory map |
56 | 94 - Begin object block |
57 | 95 - End object block |
58 | 96 - ETag type MD5 |
59 | 97 - ETag type null-terminated string (non-MD5) |
60 | 98 |
61 | 99 |
62 | 9A - Begin metadata block |
63 | 9B - End metadata block |
64 | 9C |
65 | 9D - File signature byte |
66 | 9E |
67 | 9F - File signature byte |
68 |
|
69 | ### File structure |
70 |
|
71 | ``` |
72 | Header { |
73 | File signature - 4 bytes - 9D 9F 53 33 |
74 | File version - 1 byte - 01 |
75 | } |
76 | |
77 | Metadata block { |
78 | Begin metadata block control byte - 9A |
79 | Last synced time - 8 bytes uint |
80 | End metadata block control byte - 9B |
81 | } |
82 | |
83 | Bucket block { |
84 | Begin bucket block control byte - 90 |
85 | Bucket name - null-terminated string |
86 | Directory map { |
87 | Begin directory map block control byte - 92 |
88 | Path to local directory - null-terminated string |
89 | S3 key prefix - null-terminated string |
90 | Recursive sync - 1 byte boolean |
91 | End directory map block control byte - 93 |
92 | }... |
93 | Recorded object { |
94 | Begin object block control byte - 94 |
95 | Key - null-terminated string |
96 | Last modified time - 8 bytes uint |
97 | ETag type - 96 or 97 |
98 | ETag - 16 bytes or null-terminated string |
99 | File size - 8 bytes uint |
100 | End object block control byte - 95 |
101 | }... |
102 | End bucket block control byte - 91 |
103 | }... |
104 | ``` |
105 |
|
106 | ## Copyright |
107 |
|
108 | This program is copyrighted by [Joshua Stockin](https://joshstock.in/) and |
109 | licensed under the [MIT License](LICENSE). |
110 |
|
111 | A form of the following should be present in each source file. |
112 |
|
113 | ```txt |
114 | s3-bsync Copyright (c) 2021 Joshua Stockin |
115 | <https://joshstock.in> |
116 | <https://git.joshstock.in/s3-bsync> |
117 | |
118 | This software is licensed and distributed under the terms of the MIT License. |
119 | See the MIT License in the LICENSE file of this project's root folder. |
120 | |
121 | This comment block and its contents, including this disclaimer, MUST be |
122 | preserved in all copies or distributions of this software's source. |
123 | ``` |
124 |
|
125 | <<https://joshstock.in>> | [josh@joshstock.in](mailto:josh@joshstock.in) | joshuas3#9641 |
126 |
|