I recently got a few (5) hard drives to turn my home server into a NAS with trueNAS scale and my idea is to have 4 usable and 1 for redundancy, my question is… How does RAID work, like what is RAID 0, RAID 5, software RAID etc, and does any of that even matter for my use case?

  • Nibodhika@lemmy.world
    link
    fedilink
    English
    arrow-up
    36
    ·
    9 months ago

    You have a 5GB file:

    RAID 0: Each of your 5 disks stores 1GB of that data in alternating chunks (e.g. the first disk has bytes 1, 6, 11, second disk has 2, 7, 12, etc), occupying a total of 5GB. When you want to access it all disks read in parallel so you get 5x the speed of a single disk. However if one of the disks goes away you lose the entire file.

    RAID 1: The file is stored entirely on two disks, occupying 10GB, giving a read speed of 2x, and if any single disk fails you still have your entire data.

    RAID 5: Split the file in only 3 chunks similar to above, call them A, B and C, disk 1 has AB, disk 2 has BC, disk 3 has AC, the other two disks have nothing. This occupies a total of 10GB, it’s read at most st 3x the speed of a single disk, but if any single one of the 5 disks fails you still have all of your file available. However if 2 disks fail you might incur in data loss.

    That’s a rough idea and not entirely accurate, but it’s a good representation to understand how they work on a high level.

    • Aiyub@feddit.de
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      7
      ·
      9 months ago

      Better explanation of raid 5:

      You have 5GB of data and 5 disks. You split your data into 4 parts and split one on each disk. Then disk 5 remembers if there is an odd or even number of 1s on the other disks. So whichever disk fails you can count if it was odd or even. So you loose 1 disk but keep full capacity of the other disks. No doubling like suggested before

  • tburkhol@lemmy.world
    link
    fedilink
    English
    arrow-up
    14
    ·
    9 months ago

    Traditionally, RAID-0 “stripes” data across exactly 2 disks, writing half the data to each, trying to get twice the I/O speed out of disks that are much slower than the data bus. This also has the effect of looking like one disk twice the size of either physical disk, but if either disk fails, you lose the whole array. RAID-1 “mirrors” data across multiple identical disks, writing exactly the same data to all of them, again higher I/O performance, but providing redundancy instead of size. RAID-5 is like an extension of RAID-0 or a combination of -0 and -1, writing data across multiple disks, with an extra ‘parity’ disk for error correction. It requires (n) identical-sized disks but gives you storage capacity of (n-1), and allows you to rebuild the array in case any one disk fails. Any of these look to the filesystem like a single disk.

    As @ahto@feddit.de says, none of those matter for TrueNAS. Technically, trueNAS creates “JBOD” - just a bunch of disks - and uses the file system to combine all those separate disks into one logical structure. From the user perspective, these all look exactly the same, but ZFS allows for much more complicated distributions of data and more diverse sizes of physical disks.

    • taladar@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      9
      ·
      9 months ago

      RAID-6 is basically the same as RAID-5 but with two extra disks instead of one, allowing for any two disks to fail and giving you n-2 capacity.

  • R0cket_M00se@lemmy.world
    link
    fedilink
    English
    arrow-up
    9
    ·
    9 months ago

    If you have four drives you can do RAID 6 assuming your controller supports it.

    RAID 0 just puts your data on multiple drives, giving you higher read/write speeds but with no built in redundancy.

    RAID 1 is just a copy, you have your data duplicated so that if anything fails there’s an immediate copy. No increase in RW speeds.

    RAID 5/6 use “parity data” which operates somewhat like RNA/DNA when going through mitosis. The four building blocks TCGA only connect with one of the other four in pairs of two, so even if you have half the data (RNA) you know what the other half is by logical extension. The difference is that 5 uses 3 drives at a time whereas 6 uses 4, you can only withstand the failure of one drive in RAID 5 but 6 can handle the loss of two.

    RAID 10 (one-zero, not “ten”) does exactly what the name suggests, it combines the direct copy of RAID 1 with the striping of RAID 0 to give you double RW speeds with redundancy.

    Each one will reduce your overall storage by a certain amount, either because of copying the data completely or taking up space for “parity data.” The only one that doesn’t do this is RAID 0 but you have absolutely no redundancy there and if You’re considering RAID for home use I’m going to assume that’s important to you.

    • pory@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      9 months ago

      I thought RAID1 enabled faster reads too, because both drives have the complete file. Writes don’t get a speed bump ofc, since those are still bottlenecked by the slowest single drive in the array

      • R0cket_M00se@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        9 months ago

        That could be, I was trained in systems admin but work as a network engineer by profession. I’ve only set up one server in an enterprise environment and it was using RAID 6.

        I’d assume you could read from both disks at the same time though.

  • _danny@lemmy.world
    link
    fedilink
    English
    arrow-up
    5
    ·
    9 months ago

    This is a good tool for visualizing your raid needs from your capacity and total number of drives.

    https://www.seagate.com/products/nas-drives/raid-calculator/

    I’ll preface that I’m no raid expert, just a nerd that uses it occasionally.

    The main benefit of most raid configurations is the redundancy they provide. If you lose one drive, you do not lose any data. It’s kinda obvious how you can have 1:1 redundancy, you just have an exact copy of the drive. But there are ways to split data into three chunks so that you can rebuild the data from any two chunks, and 5 chunks so that you can loose and two chunks. Truly understand how raid does this could easily be an entire college course.

    Raid 0 is the exception. All it does is “join together” a bunch of drives into one disk. And if you lose an individual disk you likely will lose most of your data.

    Another big difference is read/write speed. From my understanding, every raid configuration is slower to read and write than if you were using a single drive. Each raid configuration is varying levels of slower than the “base speed”

    I typically use raid 5 or 6, since that gives some redundancy, but I can keep most of my total storage space.

    The main thing in all of this is to keep an eye on drive health. If you lose more drives than your array can handle, all of your data is gone. From my understanding, there is no easy way to get the data off a broken raid array.

    • Presi300@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      3
      ·
      9 months ago

      I’ve mentioned it in another reply, but read/write speed isn’t terribly important to me, as the whole thing is gonna be bottlenecked by a 1GBPs connection anyways. From what I read from the other replies and online, RAIDz1 sounds like the thing I’m gonna go with, as it seems robust enough and my NAS is powerful enough for the performance hit to not really matter…

  • poVoq@slrpnk.net
    link
    fedilink
    English
    arrow-up
    5
    arrow-down
    1
    ·
    edit-2
    9 months ago

    That is a way too broad question to be answer here and also depends on the file-system truenas uses.

    If I remember correctly it uses ZFS by default and you can easily find some articles explaining the different raid levels of OpenZFS online.

    Edit: ZFS is not the same as other file-systems so not all of the general RAID info you can find online is 1:1 applicable for it (same with btrfs).

  • Decronym@lemmy.decronym.xyzB
    link
    fedilink
    English
    arrow-up
    4
    ·
    edit-2
    9 months ago

    Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I’ve seen in this thread:

    Fewer Letters More Letters
    NAS Network-Attached Storage
    PCIe Peripheral Component Interconnect Express
    RAID Redundant Array of Independent Disks for mass storage
    SSD Solid State Drive mass storage
    ZFS Solaris/Linux filesystem focusing on data integrity

    5 acronyms in this thread; the most compressed thread commented on today has 6 acronyms.

    [Thread #352 for this sub, first seen 14th Dec 2023, 12:15] [FAQ] [Full list] [Contact] [Source code]

    • rtxn@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      9 months ago

      Seconded. ZFS needs to see the physical devices, so hardware RAID is out. It implements a RAID-5-like parity-based software solution called RAID-Z, and is capable of disk mirroring.

      It can work with hardware RAID or with a single physical disk, but don’t expect it to last for a long time, and definitely don’t use it beyond testing.

  • xia@lemmy.sdf.org
    link
    fedilink
    English
    arrow-up
    2
    ·
    9 months ago

    0: “i don’t care about my data.”

    1: “i REALLY care about my data”

    5: “i’ll trade you one drive now, for my data if one of the drives dies later”

  • redline23@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    9 months ago

    Other people gave a good explanation of raid and some alternatives like zfs in truenas.

    You want to avoid RAID5 with drives above 4TB. Every hard drive has can have an unrecoverable read error (URE) during the read. It’s a very low percentage change that your hard drive publishes. During a raid 5 rebuild after replacing a drive, the other drives are stressed for a long time during the rebuild. With high capacity drives you have a pretty large chance of encountering a URE and losing the entire array. The high stress on the drives can also cause drive failure if another drive was on its way out.

    I run truenas core at home in volumes that looks like raid 10. Two mirror volumes striped together for performance.

    I never played around with raidz1 (like raid 5) but you still have the chance of an URE during the resilver. I can’t comment if it’s possible or what happens during an error. I did see people recommending raidz2 to allow for two disc failures from losing data during a resilver.

  • sj_zero@lotide.fbxl.net
    link
    fedilink
    arrow-up
    1
    arrow-down
    1
    ·
    9 months ago

    The level of raid is fundamental to the operation of your raid array.

    As I recall, RAID 0 is striping. It will give you faster throughput because your array can pull values out of multiple drives at once. RAID 1 is mirroring. In that, half of the drives are used for data, and the other half are used to back up the first half. RAID 5 is parody, and that’s what you’re looking for. Essentially, your drives will mostly be used for storing data come up with the last one will be used to track what information is on the other four, so you will have one drive for redundancy and the other four will be storing data.

    Hardware raid versus software raid matters to the extent that parity calculations are relatively expensive and so if you’re trying to do RAID 5 on software raid, that’s going to eat up more of your CPU power and reduce your drive throughput.

    I don’t recall truenas in particular, and what you using the nas for is really what is important, but I do recall that some Nas software doesn’t even want you to be using hardware raid because it will be using its own software algorithms that are separate from what you would typically consider to be raid.

    • Valen@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      9 months ago

      Raid 5 is parity, not parody 😀. Each drive contains part of the information of the other drives, so that if any one of the drives dies, you can still get all the information (it will just be slower until you replace it and the system rebuilds the data on the new drive).