hello stranger and welcome to the everarch presentation!
just a few words what you can do here. press the round ?
button to hide and show this help. pressing the 🕪 button
will play the presentation together with audio
comments. you can toggle pause during the presentation
by pressing the spacebar on your keyboard. also you can
use the left and right arrow keys to navigate between
the slides.
they want to store it, find it again, maybe back it up
people evolve over the years, data structures change
everarch is a set of applications to handle this
john is a contact
everarch uses a storage system which might be unusual if you
come from relational databases. in everarch you record your
data in statements. for example 'john is a contact'.
but not only statements can be stored within everarch. you
can also upload files like this picture of a glacier. or any
other file, even files with multiple gigabytes in size.
the picture's author is john
statements can reference files or other statements. so we
can store that the picture's author is john.
everything is a claim
to simplify things, everything is a claim in everarch. no
matter if it's a statement or file. claims can reference
each other just like in a graph database.
everarch want's you to write claims
<i-declare who="evr" is="awesome!"/>(yes, you guessed it already, claims are XML)
everarch want's you to write claims. here for example we
see a claim that states how awesome everarch really is. you
can use your own elements and attributes as long as it's
xml.
claims are XML because XML has namespaces
<contact xmlns="https://my-ns" …/>
also claims are xml because xml has namespaces.
this gives you the opportunity to record your own statements
about the world without worrying you will clash with
somebody else's claims in the future.
claims are XML because XML has XSLT
(so claims can easily be transformed)
claims are xml because xml has xslt, a powerful xml
transformation language. we will later use it to transform
claims from your custom schemas into special everarch schemas
which act as an API between you and everarch.
everarch brings a file claim. this claim will be used to
define actual files stored in everarch. you can see it's
declaring the file's name as title attribute. the files's
content is split into slices which make up the file as a
whole.
what is that sha3-224-7eda3e8d26f1glibberish…???
but what's that sha glibberish within the slices?
it's a content addressable storage key everarch assigns to your data
it's a content addressable storage key everarch assigns to
your data. no matter if you are storing claims or binary
data like a file's slices.
key
data
sha3-224-abc…
hello world!
sha3-224-def…
<file xmlns=…
so in real world applications every content has it's unique
sha key. putting the same data into the store will always
produce the same key.
the evr-glacier-storage application is providing this kind
of key/value store.
why are content addressable storages with sha3-224 so nice?
but why are content addressable storages with sha hashes so
nice?
evr-glacier-storage can detect external data modification
with sha hashes everarch glacier storage can detect external
data modifications.
such modifications can occur if bits on your hard disk flip
randomly because of hardware errors. now everarch can detect
these errors and fix them when synchronizing with a backup.
sha3 is cryptographically safe. that's why it will also be
practically impossible for a third party to modify your data
without modifying the sha3 hash.
uploading that video of your marriage twice will lead to the same sha3-224 keys, so the video will only be once in evr-glacier-storage
(no duplication? that's good because my disk is always nearly full)
uploading that video of your marriage twice will lead to the
same sha keys. so the video will only be once
stored. everarch detects on the second upload that the sha
keys already exist and skips them.
images are uploaded to separated everarch installations
📱 your phone
💻 your PC
upload 🐱.jpg and 🐭.jpg
upload 🐶.jpg and 🐭.jpg
uploads to separated everarch installations also benefit
from producing the same sha keys for the same
content. imagine uploading exactly the same mouse image on
your phone and PC separately together with some other images.
later they synchronize their uploads
evr-glacier-storage will contain 🐱.jpg, 🐶.jpg and 🐭.jpg exactly once
later the everarch storages on the phone and PC are
synchronized. after synchronization every image is stored
exactly once on each device, even the mouse image.
backups are fast
storage
backup
sha3-224-1…
sha3-224-1…
sha3-224-2…
sha3-224-3…
sha3-224-3…
backups are fast.
they are fast because any synchronization between two
installations can just compare the sha keys. only data for
missing keys must be copied.
how are we going to find anything if we only have sha3 hashes?
now using sha keys is fine. but how are we going to find
anything if we only have sha hashes?
we need an index… evr-attr-index
we need an index. everarch attribute index.
it reads through all claims
everarch attribute index reads through all the claims within
everarch glacier storage.
transforms each claim into key/value attributes
every claim is transformed into simple key/value attribute
definitions.
makes the attributes searchable
finally the transformed attributes are indexed and made
searchable.
so as mentioned earlier we have our custom claims defined by
ourself. these must be converted into attr claims because
that's the only thing everarch attribute index can actually
index.
so you must provide an XSLT to transform your claims into attr claims
the transformation of the custom claims you defined yourself
is done by a XSLT stylesheet. you must also provide that
stylesheet.
so for example you can have your own contact claim with
custom fields like shown on the left hand side. but the
transformation must output attr claims like shown on the
right hand side.
then you meet your first Javanese friend and realize there
might be no last-name… also their addresses are totally
different than you designed it in the first place
so one fine day you will meet your first javanese friend and
realize there might be no last name at all. also their
adresses are totally different to the way you designed it in
the first place.
you need to migrate? 😨
(you are really afraid of migrations because you remember that old saying)
ohh ohh… you need to migrate your old claims now? and yes,
you are really afraid of migrations because you remember
that old saying.
"Three migrations in a row is like burnt down once."
— Markus Peröbner
three migrations is a row is like burnt down once. markus
peröbner. meaning with every migration you will miss one
thing in your data and lose it forever. enough migrations
and the glory of your legacy data will be gone.
in everarch you never modify your legacy data… you introduce new claims and extend the way you index
in everarch you never modify your legacy data. you introduce
new claims as you need them and extend the way you index. so
you will have a transformation for your old contact claim
with the simplistic address definition. and also you will
have a transformation for your new more powerful contact
claim with the now perfect address definition. and if you
mess up the trasformation you will realize it one day, fix
the transformation, reindex and have a good index
afterwards. nothing will be lost because of an only 99% good
migration.
so how can things be updated if you can't delete or modify legacy data?
maybe you ask by now how things can be update if you can't
delete or modify legacy data?
attr claims can indicate they want to update a former uploaded seed claim
attribute claims can indicate they want to update a former
uploaded claim called the seed claim. the seed claim is
referenced by a claim reference. the claim reference is just
a pointer to the location in the storage where the claim is
persisted. that's why the reference contains a sha hash.
a seed example with a single contact
so here we have a seed example with a single contact from
your address book. the seed claim is the one on the left
hand side which was uploaded first and does not reference
another seed claim. later another claim was uploaded
specifying the birthday. that later uploaded claim must
reference the original contact claim with the seed
attribute.
how can we find seeds?
we can now update our claims but how can be find seed
claims?
evr-attr-index has it's own query language
select * where tag=todo
everarch attribute index has it's own query
language. reminds in aspects of SQL.
the query above might return something like this. a list of
seed claims indicated by the seed's claim reference together
with the aggregated attributes of that seed.
we have emacs integration
everarch has emacs integration to make browsing seeds and
creating claims more fun.
query evr-attr-index and browse results with major-mode
you can query everarch attribute index and browse the
results in a nicely styled major mode with handy keyboard
shortcuts.
create claims from templates and post them to evr-glacier-storage
you can create a claims from templates with just a keypress
and post them to everarch glacier storage. seed attributes
are automatically added when you need them.
now thanks for listening. i hope you enjoyed the
presentation. everything else can be found in the everarch
github repository if you want the get started exploring
everarch.