Hard(rive) lessons and why (not to) self-host your own services
I have been exploring the advantages and disadvantages of self-hosting my own services for personal and community work.
In recent years, I have been exploring the advantages and disadvantages of self-hosting my own services for personal and community work. In this blog, I outline a personal approach to self-hosting, discuss potentially relevant resources, the social consequences and implications of self-hosting, and wrap up by referring to the long-term sustainability of deploying systems at small scales. This will be a series of posts – this one corresponds to my first blog entry on this subject.
The perfect excuse – my photos
Self-hosting, for me, refers to the process of serving and maintaining services within and across networks. From my perspective – services are generally deployed from containers, some or most data is stored permanently outside of the container, particular ports are enabled within my network so access to services can be secured, and access through the website is ensured through at least a port forwarding structure and reverse proxies. Learning some of the terminology has taken some time – I have not been rushed to dive into this immediately. Instead, I've been taking a step-by-step approach, primarily focused on replacing particular systems that I have considered critical for my online presence.
But why self-host? How did it happen and why did I decide to learn something else? Was there any need for that? To partially answer some of these questions, I'd have to refer to some of the activities that bring me joy. I've been taking and collecting photos for roughly 2/3 of my life. I, unfortunately, don't have every single photo I wish I had – my brain, in some situations, refers to particular frames from those photos that ultimately led me to remember the situation. Having access to some of those memories not only allows me to keep a version of my history, upbringing, as well as my close relatives' and friends', but also enables me to reconstruct particular scenarios that I experience based on what I was seeing through the viewfinder.
When I'm the subject in some of those photos, I somehow reflect in third person – thinking about what I was doing, what I was saying, and filling in details on the context of that photograph. Instead, when I'm the photographer, I tend to remember beyond the limited angle of the viewfinder. I get excited about the smiles of friends and family, the shyness of some of them hiding from my lens, the walk that preceded finding that person who I was just about to meet, or that frog that was perched quietly there. Early on, I understood that photos and not videos were my access to memories that otherwise wouldn't be around for me.
In the same situations, I tend to value equally photos that I have taken along with the ones that other people present have taken. Yet, the kind of information that I'm able to extract from each is completely different. They present different perspectives of what was relevant, what was worth highlighting at a given moment. These sets also, in many cases, have intrinsic biases that relate not only to what was photographed, but what was effectively preserved. Perspectives, deletions, invites to subjects, lighting, choices of lenses, among others, directly affect how the narrative gets conceived, assembled, and ultimately reflected. Replicating and reproducing these images is virtually impossible once they have been taken. Preserving those memories has been for me something important in my adult life. Knowing that I can trace how my family has changed over time through decades of photos, that I can pull a photo of someone I hadn't seen in more than 8 years from the last place we were together, that I can remember the date and location I took that photo of an insect – these have so much value for me. Yet, there are consequences.
What matters for this document is not just (1) the number of photos I store, (2) all the metadata associated with them or even (3) how I manage to have constant access to my media. For me, the key aspects that explain the relevance of having my own data in my own server relate to enabling me to have a corner of privacy and sovereignty over my own files and their metadata.
The genesis
Five years ago, all of my photos were accessible through three main systems. First, an external hard drive where I would go and drop off all the photos taken with my DSLR. That very same hard drive also included folders that I recovered from two of my old laptops that failed at some point. In total, all of those folders would have included thousands of media files dispersed through a DSLR-dedicated folder as well as thousands of other folders reflecting the structure of my other computers (e.g. documents, desktop, downloads, Facebook/Instagram downloads). Second, I had kept for years photos in my Google Drive. For some reason, storage was very generous a while ago, matching with the time when I would be very active at sharing my photos with friends and family. However, I started to quickly realize that the basic storage was not enough, an aspect that pushed me to consider (and effectively purchase) a 1-year subscription to 200 GB of storage. Days after seeing that change in my credit card, I had realized that I had, for some reason, failed myself in multiple ways. I had been actively resisting and understanding Google's tactics for years, but at that point, I was largely desperate for more storage. Third, I had a large number of photos in iCloud. Years of using iPhones implied that the most natural way to backup media was to their cloud. Same as with Google Photos, I ran out of space on iCloud, closed my eyes, and started to pay a monthly subscription to their storage service. I forgot about those charges and enjoyed the convenience of seeing so many of my photos sorted and organized in a single place. In enjoyed the facial recognition feature in their system.
Subscriptions, not being able to integrate old phone photos, new phone photos, and thousands of media files from DSLR or shared through other services to my laptops was getting a bit frustrating. At some point, I decided to consolidate my media files into my external hard drive. I carefully exported all my media from iCloud and Google but I still suspect some files were never exported from their servers. I didn't look back. I cancelled subscriptions and started to do manual backups from my phone. In the long run, I knew this whole process was not going to be sustainable. I'm not a person who especially stays on top of things, especially the largely optional ones. It was simply not convenient, yet, I kept reviewing options. After some research I realized that photo management software was an option. I would plug in my external hard drive into my laptop, open the software, and allow it to scan faces, tag photos by location, etc. That automatic process, despite being full of mistakes, changed the way I saw my whole library. I started to see the same person from the formality of the viewfinder in my DSLR and the simplicity of my phone. The same person changed over time, not sure if it was related to how resolution fluctuates over time in the same phone (my personal observation) or simply a reflection of changing phones. I started to notice many more things as I was able to see all my files and their metadata get organized in front of my eyes.
Hard(drive) lessons
One of the aspects I never considered was the need for moving my external hard drive outside of my house. I used to simply keep it in my desk – ready to be plugged in and used to either transfer new photos or browse the ones I already had in it. The system was neither efficient nor safe but it worked for what I wanted. At this stage, I was offered a faculty position that involved creating my own lab. I had saved data for years – old projects that I never finalized, ideas for potential papers and grants, scripts for analyses that were already published took hundreds of gigabytes in my external hard drive. I had been collecting and organizing them for quite a long time. It was, for me, my own personal startup. One day, I decided that some of those files needed to be transferred to a new external hard drive I had just purchased for my lab. That new external hard drive was, I thought at that point, the central location for my lab. We were going to do manual backups once in a while and all the information was going to be versioned in there at a given frequency.
I took my hard drive to campus that day, started to move files and quickly realized that it was going to take way longer than what I was willing to wait. I decided to take my hard drive back home and consider another plan. Once in the car, I locked my hard drive in the glove box. Once at home, I unlocked the glovebox to find something else but forgot to grab the hard drive out. The day after, while the car was parked near campus, someone broke a window, broke in, grabbed the hard drive, and left the USB adapter in the passenger seat. A couple days later, and $60 well (but shamefully) invested, the hard drive was back in my hands. The details on how I recovered the hard drive are clearly beyond the scope of this post. However, the fact that I had the hard drive in my hands again, the need to transfer files to a remote location was still valid, and I needed to store work- and leisure-related files independently.
Accessing my data remotely
While I was reviewing options to purchase a lab external hard drive, one option caught my attention. The hard drive was nothing super special – a few terabytes of storage through the usual commercial hard drives. However, it came with a short-term subscription that enabled the hard drive to be accessible through the internet. At that moment I didn't understand what that meant and ended up purchasing a more classic solution, equivalent to the one I already had at home. I did, however, know at that point that accessing files on a hard drive through a website was potentially an option I was interested in understanding more about. A couple months later, after doing research on the subject, I understood that Network Attached Storage (NAS) systems could be viable solutions – yet, I had not envisioned enabling access from outside the network until after I purchased one.
Serving my own apps
Once I purchased my first NAS (DS 220+), a couple apps resulted of particular interest. Synology Photos seemed relevant to store and visualize all the photos I had stored for years in a single place. Synology Drive was at that point relevant as a replacement for Google Drive.
I started to migrate all the photos from my external hard drive into my new NAS. I organized every png, jpg, every image folder, into a main "Photos" folder. The same folder was also supposed to be the one storing all new photos downloaded from my cameras and phone. The transfer took roughly one day – indexing all the photos, recognizing faces, objects, and sorting through metadata a few more. At the end of that week, I had easy access to all of my photos through Synology Photos. Something that I had not even considered possible at that point given the amount of data and my requirements for the quality and speed of deployment. I, however, had to upgrade the RAM memory to 18 GB (using a 16GB card along with the 2 GB already included in the hardware). To date, the system works well, with photos and videos accessible at any time, with backups from my phone made very frequently. I have not looked back to any of the other services that I was familiar with since.
At this point, the limiting factors related to convenience (it was easy for me to keep the inertia of using already integrated services), time (I needed to do additional research to find alternatives), and money (monthly payments versus a larger one-time purchase). As I have already talked briefly about the first two aspects, I'd refer to the financial one very briefly. Purchasing a piece of equipment such as a NAS to manage and service apps implies incurring different new expenses. First, one needs to have enough funds to purchase the hardware (costing >$400 + hard drives).
What's next?
Should everyone host their own services? Should this be a skill that everyone should learn? Should people worry about their privacy and data sovereignty this much to learn a new skill? There will be a second post on these topics.