hello stranger and welcome to the everarch presentation!

just a few words what you can do here. press the round ? button to hide and show this help. pressing the 🕪 button will play the presentation together with audio comments. you can toggle pause during the presentation by pressing the spacebar on your keyboard. also you can use the left and right arrow keys to navigate between the slides.

what is everarch?

© 2022 Markus Peröbner

everarch — the hopefully ever lasting archive

people have personal data, sometimes structured

they want to store it, find it again, maybe back it up

people evolve over the years, data structures change

everarch is a set of applications to handle this

john is a contact

everarch uses a storage system which might be unusual if you come from relational databases. in everarch you record your data in statements. for example 'john is a contact'.

but not only statements can be stored within everarch. you can also upload files like this picture of a glacier. or any other file, even files with multiple gigabytes in size.

the picture's author is john

statements can reference files or other statements. so we can store that the picture's author is john.

everything is a claim

to simplify things, everything is a claim in everarch. no matter if it's a statement or file. claims can reference each other just like in a graph database.

everarch want's you to write claims

<i-declare who="evr" is="awesome!"/> (yes, you guessed it already, claims are XML)

everarch want's you to write claims. here for example we see a claim that states how awesome everarch really is. you can use your own elements and attributes as long as it's xml.

claims are XML because XML has namespaces

<contact xmlns="https://my-ns" …/>

also claims are xml because xml has namespaces.

this gives you the opportunity to record your own statements about the world without worrying you will clash with somebody else's claims in the future.

claims are XML because XML has XSLT

(so claims can easily be transformed)

claims are xml because xml has xslt, a powerful xml transformation language. we will later use it to transform claims from your custom schemas into special everarch schemas which act as an API between you and everarch.

everarch brings a file claim

<file
    xmlns="https://evr.ma300k.de/claims/"
    xmlns:dc="http://purl.org/dc/terms/"
    dc:title="hello-world.txt"
    >
  <body>
   <slice ref="sha3-224-7eda3e8d26f147821a258850956f9ed640fb0b3a8a04ae56a2f58a32" size="12"/>
  </body>
 </file>

everarch brings a file claim. this claim will be used to define actual files stored in everarch. you can see it's declaring the file's name as title attribute. the files's content is split into slices which make up the file as a whole.

what is that sha3-224-7eda3e8d26f1glibberish…???

but what's that sha glibberish within the slices?

it's a content addressable storage key everarch assigns to your data

it's a content addressable storage key everarch assigns to your data. no matter if you are storing claims or binary data like a file's slices.

key	data
sha3-224-abc…	hello world!
sha3-224-def…	<file xmlns=…

so in real world applications every content has it's unique sha key. putting the same data into the store will always produce the same key.

the evr-glacier-storage application is providing this kind of key/value store.

why are content addressable storages with sha3-224 so nice?

but why are content addressable storages with sha hashes so nice?

evr-glacier-storage can detect external data modification

with sha hashes everarch glacier storage can detect external data modifications.

such modifications can occur if bits on your hard disk flip randomly because of hardware errors. now everarch can detect these errors and fix them when synchronizing with a backup. sha3 is cryptographically safe. that's why it will also be practically impossible for a third party to modify your data without modifying the sha3 hash.

uploading that video of your marriage twice will lead to the same sha3-224 keys, so the video will only be once in evr-glacier-storage

(no duplication? that's good because my disk is always nearly full)

uploading that video of your marriage twice will lead to the same sha keys. so the video will only be once stored. everarch detects on the second upload that the sha keys already exist and skips them.

images are uploaded to separated everarch installations

📱 your phone

💻 your PC

upload 🐱.jpg and 🐭.jpg

upload 🐶.jpg and 🐭.jpg

uploads to separated everarch installations also benefit from producing the same sha keys for the same content. imagine uploading exactly the same mouse image on your phone and PC separately together with some other images.

later they synchronize their uploads

evr-glacier-storage will contain 🐱.jpg, 🐶.jpg and 🐭.jpg exactly once

later the everarch storages on the phone and PC are synchronized. after synchronization every image is stored exactly once on each device, even the mouse image.

backups are fast

storage	backup
sha3-224-1…	sha3-224-1…
sha3-224-2…
sha3-224-3…	sha3-224-3…

backups are fast.

they are fast because any synchronization between two installations can just compare the sha keys. only data for missing keys must be copied.

how are we going to find anything if we only have sha3 hashes?

now using sha keys is fine. but how are we going to find anything if we only have sha hashes?

we need an index… evr-attr-index

we need an index. everarch attribute index.

it reads through all claims

everarch attribute index reads through all the claims within everarch glacier storage.

transforms each claim into key/value attributes

every claim is transformed into simple key/value attribute definitions.

makes the attributes searchable

finally the transformed attributes are indexed and made searchable.

evr-attr-index can exclusively index attr claims

<attr xmlns="https://evr.ma300k.de/claims/">
  <a k="a-key" v="a-value"/>
</attr>

so as mentioned earlier we have our custom claims defined by ourself. these must be converted into attr claims because that's the only thing everarch attribute index can actually index.

so you must provide an XSLT to transform your claims into attr claims

<contact xmlns="https://my/ns/">
  <first-name>Markus</first-name>
  <last-name>Peröbner</last-name>
</contact>

<attr xmlns="https://evr.ma300k.de/claims/">
  <a k="name" v="Markus Peröbner"/>
</attr>

the transformation of the custom claims you defined yourself is done by a XSLT stylesheet. you must also provide that stylesheet. so for example you can have your own contact claim with custom fields like shown on the left hand side. but the transformation must output attr claims like shown on the right hand side.

then you meet your first Javanese friend and realize there might be no last-name… also their addresses are totally different than you designed it in the first place

so one fine day you will meet your first javanese friend and realize there might be no last name at all. also their adresses are totally different to the way you designed it in the first place.

you need to migrate? 😨

(you are really afraid of migrations because you remember that old saying)

ohh ohh… you need to migrate your old claims now? and yes, you are really afraid of migrations because you remember that old saying.

"Three migrations in a row is like burnt down once."

— Markus Peröbner

three migrations is a row is like burnt down once. markus peröbner. meaning with every migration you will miss one thing in your data and lose it forever. enough migrations and the glory of your legacy data will be gone.

in everarch you never modify your legacy data… you introduce new claims and extend the way you index

in everarch you never modify your legacy data. you introduce new claims as you need them and extend the way you index. so you will have a transformation for your old contact claim with the simplistic address definition. and also you will have a transformation for your new more powerful contact claim with the now perfect address definition. and if you mess up the trasformation you will realize it one day, fix the transformation, reindex and have a good index afterwards. nothing will be lost because of an only 99% good migration.

so how can things be updated if you can't delete or modify legacy data?

maybe you ask by now how things can be update if you can't delete or modify legacy data?

attr claims can indicate they want to update a former uploaded seed claim

<attr
    xmlns="https://evr.ma300k.de/claims/"
    seed="sha3-224-7eda3…58a32-0000"
    >
  <a op="-" k="a-key" v="a-value"/>
</attr>

(in this case we remove the value)

attribute claims can indicate they want to update a former uploaded claim called the seed claim. the seed claim is referenced by a claim reference. the claim reference is just a pointer to the location in the storage where the claim is persisted. that's why the reference contains a sha hash.

a seed example with a single contact

so here we have a seed example with a single contact from your address book. the seed claim is the one on the left hand side which was uploaded first and does not reference another seed claim. later another claim was uploaded specifying the birthday. that later uploaded claim must reference the original contact claim with the seed attribute.

how can we find seeds?

we can now update our claims but how can be find seed claims?

evr-attr-index has it's own query language

select * where tag=todo

everarch attribute index has it's own query language. reminds in aspects of SQL.

will respond something like

sha3-224-23451f0a911555ec5e8aaddc0aeeb7993246aa61a573f0bc3019b177-0000
	title=first claim
	tag=todo
sha3-224-47444db3f2a8448843bbad8f5ca9c44fa750d587b37eb9426fb59732-0000
	title=second claim
	tag=todo

the query above might return something like this. a list of seed claims indicated by the seed's claim reference together with the aggregated attributes of that seed.

we have emacs integration

everarch has emacs integration to make browsing seeds and creating claims more fun.

query evr-attr-index and browse results with major-mode

you can query everarch attribute index and browse the results in a nicely styled major mode with handy keyboard shortcuts.

create claims from templates and post them to evr-glacier-storage

you can create a claims from templates with just a keypress and post them to everarch glacier storage. seed attributes are automatically added when you need them.

thanks for listening!

everything else can be found in the everarch git repo and the everarch documentation

Impressum

now thanks for listening. i hope you enjoyed the presentation. everything else can be found in the everarch git repository if you want the get started exploring everarch.