For a few years now - I've had a local NAS, with local RAID 1 disks to provide redundancy - gone from NAS to NAS, and disk to disk with what feels like endless failure rates - i've got a good chunk of drives that are rectified returns or failed drives with warranty expiration. A point that most people come to is when you've used all the bays in your NAS and the next step is to either rebuild your NAS with larger and larger drives or buy an "extension" to your existing one that adds more bays.
Well, this all seemed great and have no doubts that it wouldn't work, after all, that's what we've been doing for decades now - it felt old, and somewhat costly - afterall, while capacities are getting bigger, with each new capacity iteration you end up with an increased price per gigabyte.
So, I planned out some super computer - took inspiration from Linus Tech Tips and his 7 gamers, 1 CPU using unraid, that way, i could have the base RAID system and run Windows and Ubuntu on there (that was kind of a nice extra and if you're going to do it, you may as well over do it) however in the final cost calculation it ended up being fairly expensive at just under £3,000.
A better way
So while, i'm sure that would have worked - the upfront cost is fairly highly - you could argue there's also "running costs" but i'd only really count that as electricity, which is probably minimal - i'd argue where "cost" goes is into time, after all, no one's time is free. The time to setup the RAID, build it and maintain it, because, drives will fail (extra cost there too) - and don't forget, however much storage you need now when you eventually use all that up, you have the next set of capital as your system doesn't really "scale".
So, what's could be a better way? Recently, i've been working more and more with Amazon Web Services and looked to that for answers, initially looking at a way to store my existing files using some kind of block store. Look into S3 pricing and found that storing 5 Terabytes of data would cost near $145 a month - which is considerably expensive.
To add context to all this, I was looking for a way to store this data in Europe - not because of legal reasons or not wanting data to travel, because that's not something i generally care about, it's not sensitive information - it's purely a plex library. The problem comes from latency, if enough latency is added - starting and scrubbing (rewinding, resuming, fast forwarding) becomes near impossible with huge delays.
So, i started to look around the internet for other solutions - i've tableised the results from each service and the reason I went for or against them;
|Amazon Cloud Drive||Looks like this was a popular one for this kind of setup, in the US, Amazon offers unlimited storage for $59 yearly - however, in the UK or Europe there's no similar deal with 1TB costing £340 per year|
|Google Drive||So while this one didn't price too badly, there were a lot of negative reports of speed, potentially not an issue from the service itself but from the Fuse connector|
|AWS S3||This was by far too expensive - even though i'm sure it'd be the fastest and most reliable|
|HubiC||Seemed to be owned by OVH but has a maximum data transfer rate of 10Mb\s - that sounds painful, and not great for concurrency|
|OVH Block Store||Lowest price I could find at £0.007 per gigabyte (per month, in Europe) however, that can still add up and become expensive (per month)|
|BackBlaze B2||BackBlaze B2 offers the lowest price per gigabyte i've seen at $0.005, but is in the US, they're not a new company and seem to be fairly reliable, so that's what I opted for|
I set up an EC2 instance in Ireland (I figured AWS's backbone to the US should be pretty reliable, even though BackBlaze only has a single datacentre in the west coast of the US) and proceeded to try and mount the b2 fuse connector - under the Amazon Linux AMI I had a fair few problems with missing the Fuse kernel module required - after a few hours of effort it seems the consensus was to jump to Ubuntu - so if you're going to do this on EC2, don't use the Amazon Linux AMI.
With Ubuntu, and actually having the kernel module this time - all was setup and B2 was mounted as if it was just part of the local file system, I noticed quickly Plex was having some issues - i wasn't really sure where the bottleneck was, but did notice that the b2 fuse python connector did keep running out of memory.
After doing a speed test, I ended up with;
ubuntu@ip-172-31-41-223:/mnt/multimedia$ dd if=/dev/zero of=/mnt/multimedia/testfile bs=128M count=1 oflag=direct 1+0 records in 1+0 records out 134217728 bytes (134 MB) copied, 318.648 s, 421 kB/s
Pretty poor, explains why Plex was struggling to stream the content back to the client. Infact, my first attempt at this actually failed;
ubuntu@ip-172-31-41-223:/mnt/multimedia$ dd if=/dev/zero of=/mnt/multimedia/testfile bs=1G count=1 oflag=direct dd: error writing ‘/mnt/multimedia/testfile’: Bad address dd: closing output file ‘/mnt/multimedia/testfile’: Bad address
But at this point, i'm speculating, could it be EC2's low network performance? Well, i did a ping back to B2 and achieved about 150 ms of latency, not great. I decided to get a bucket going on S3 just to test and hopefully see where the issue was - so, got my bucket setup, installed S3FS and repeated my test (I mounted S3 in the same place);
ubuntu@ip-172-31-41-223:/mnt/multimedia$ dd if=/dev/zero of=/mnt/multimedia/testfile bs=128M count=1 oflag=direct 1+0 records in 1+0 records out 134217728 bytes (134 MB) copied, 8.63128 s, 15.6 MB/s
As you can see, network performance was significantly improved. Clearly S3 was the way forward with this but that $145 a month pricing was fairly off-putting, researching a little deeper - there is a reduced redundancy S3 (this is actually on an object basis but, more than halves the cost) at a mere $0.012 per gigabyte.
Network performance was probably increased due S3 being local in Ireland with the EC2 instance (but yet, i'd still have expected to have seen better performance from the BackBlaze B2 but, this could be down to a badly written Fuse Connector and you may have a totally different experience).
Moving into S3
So, planning wise and through initial testing - we've gleamed that running Plex in the Cloud is possible and works amazingly well (with a small subset of trailers) and is like having your own Netflix available from anywhere. So, Infrequent Access of S3 (STANDARD_IA) is an interesting name that they've given it - it still repliaces data, and suggests that they very rarely lose objects (if at all?) and chances are;
- You don't care if they're lost
- Can easily re-rip
- Have a local copy
A lot of this was starting to make sense and actually become feasible, so, let's take that $63 or so and say that's £45 or so a month, with the initial setup cost of that super computer we talked about earlier - we can run the same collection in the cloud using S3 for 5 years and 6 months.
Mounting with EC2
So, I used Fuse with Ubuntu on EC2 to connect to the S3 bucket - first couple of files started working quite well, however - that's where the positivity ends, after a week, it was taking 12+ hours to do library scans.
This is where the problem lies, and i'm unsure specifically the problem is, when Plex does a library scan, it analyses the media and in doing that FuseS3 downloads the entire file - so while data between the EC2 instance and S3 is free, it was taking an incredibly long time to download and scan which also was saturating the network connection causing data rate issues (when library scanning, you couldn't stream).
This became too big of a problem to continue and would only escalate - maybe there could have been a better virtual file system to use but, short term - there's got to be a better solution - maybe a native file system?
Local File System
So, almost 12 months happened between the last sentence and this one, and in that time I've had a lot of iterations of this, I did attempt to use a VPS company in the Netherlands called TransIP. The reason for this was they had a locally connected "block storage" and basically seen as a local disk (they claim, with 10 Gb\s connectivity).
It was often slow, and if it was connected at 10 Gb\s that must have been connected to their entire VM cluster at that speed, I saw speeds between 10 and 30 MB\s per second, It was a nice-ish solution but, came at a price too. At €10 per 2TB was costing €30 per month just for the collection storage.
However, I'll give them credit, it's a super nice interface, very intuitive and easy to use - especially easy to attach their block storage into a VM - it was just a shame about the price.
I needed to find another solution, that was cost-effective but had similar or better performance.
Amazon Cloud Drive - The Holy Grail
There was some talk of this already when doing my research, however, it wasn't available in the UK until I got an email during my TransIP testing announcing it was available in the UK. So, I signed up straight away, came across acd_cli which uses Fuse and mounts as a local file system.
So, I setup a server, ran through the acd_cli setup and mounted the disk. From initial tests, it seemed to be running well - performance was fantastic, in some cases I managed 60 MB\s (megabytes) - when looking through
iftop it looked like it was actually just using S3, spare capacity they're looking to utilise perhaps?
Super Slow Library Scanning
However, library scanning was super slow - when I was looking through the logs, seemed like there was a delay after every file when creating the "hash", seems to be that their
hash requires part of the file (instead of say, that other unique part - the file path or, even
stat'ing the file - which was instant (precached)). Anyway, my complaint went
unanswered and I guess, potentially unsupported; https://forums.plex.tv/discussion/233241/slow-path-matching-hashing#latest.
This work really well, if you can get over the initial library scanning time - but that is a one-off, subsequent scans were considerably quicker as the amount of files that had changed were considerably lower - and then, my praises came too quickly.
Literally, in the last week (May 2017) Amazon has started banning third-party applications such as acd_cli and rclone - this was really unexpected and happened very suddenly. I'm not sure quite what triggered this and why now - I could potentially understand if they had a subsequent product but, their Cloud Drive app doesn't even work on Linux leaving a lot of people out in the cold. There's no real way to resolve this other than, migrating provider to one that supports what you want to do with it, I think Amazon may have a change of plan for what they want to deliver in terms of their Drive.
This has been an absolute nightmare, and a very slow nightmare at that - how do you access your drive when third party applications have been banned from accessing it and the officially supported application doesn't mount as a drive (nor work on Linux but, we'll overlook that for now) - the likelihood of you having somewhere to download all of that at speed, to store, then upload somewhere else is unlikely.
I started looking into other services that would do this in the cloud, I tried MultCloud, with transfer speeds of 800 KB\s that was going to take a long time! And now even longer, it completely stopped syncing a day after I signed up (have they been banned now too? or user overload?)
Currently, I have a large Windows VM setup in GCloud (ingress is free, so data from Amazon into the Instance is free, data from the instance to GDrive is free - free free) with NetDrive
mounting the Google Drive account as a disk (I tried others, including ExpanDrive but it was horrific, and any CloudBerry product just error'd constantly),
I then installed the official Amazon Cloud Drive App, select a folder, and then
symlinked the proceeding folders into the mounted GDrive.
mklink /D "C:\Users\andrew_s\Amazon Drive\Folder" D:\Folder
This works but, you'll find, it'll download 4 files at a time, wait till they've downloaded to disk and then NetDrive will upload them into GDrive (the Amazon app will continue to download another 4, while the previous 4 are at "100%"), speeds are varying between 50Mb\s to 1.2Gb\s so, mileage may vary. I'm not sure where the primary slowness is - probably Windows - it's just bad at this stuff, but also potentially the NetDrive or Amazon Cloud Drive app.
GDrive - The New Found Holy Grail (for now?)
So far, with GDrive having unlimited storage (using GSuite) - it could be the one, speeds seem fast (in places, depending on the app) - and Google seems to be more open when it comes to their API's and their support for Linux (I mean, Android right?). Maybe I'll come back to this in another year, and provide another update.
Conclusion and what we've learnt
Since starting this back in mid-2016, solutions and products have changed. As soon as you put your data in the cloud, you're at the mercy of the Cloud Provider but I still fully believe this is the future. 10 Years ago, getting 5 gigabytes free was considered really good. Now, we're all looking at putting terabytes in the cloud, storage gets cheaper, those costs are passed on to the consumers.
Let's not forget, these cloud providers like Google and Amazon have economies of scale on their side, storage per gigabyte for them is likely considerably cheaper than us "consumers".
The other thing that changed was, Plex actually launched their own Cloud service - now making part of my quest irrelevant, I've been experimenting with Plex Cloud and Google Drive, so far, so good.
My final thoughts on this, this was a lot harder than I thought - even for 2017. I'd like to see storage providers more open (let's not forget, while these services are "cheap", they're not free - we're still paying for it, so, I'd like to see better access to our data).
However, if you'd like to achieve this now, I'd recommend GDrive and Plex Cloud - if you care about your data, make sure you keep a local data for now, until the landscape changes at least - but I've gone full cloud.