I recently came across some valuable research from a publication from FASTA Universidad in Buenos Aires, which focus on new developments in RAID configurations. I will share the research in two separate articles to make it more digestible.
Over the past eight years, the Research Group on Operative Systems and Computer Forensics of Universidad FASTA in Buenos Aires has been trying to define and improve a Unified Process of
Information Retrieval (PURI) for RAID systems. This initial work was extended to adapt it and
apply it to Smart-phones as well as to Distributed Systems.
During the development of this process, aspects of computer forensics were detected with
tools called “lacking niches”.
Undergraduate and graduate school students presented a number of projects on these RAID systems.
In the study of Distributed Systems and Cloud Computing environments, the following issues were analyzed:
- distributed computing environments
- The presence of servers and their impact
- Specific types of machines or cloud systems.
One topic that gained special interest, both for the technical challenge
as well as by the need of computer experts, was the reconstruction of disk arrays.
The use of RAID arrays with both medium to large storage capacities, presents several challenges to
forensic computer scientists, including those who do not have the capacity to
store sufficient RAID volume for their projects. There isn’t a reliable procedure for the acquisition
of RAID arrangements.
If the correct procedures are not followed,
the array becomes little more than a worthless stack of disks (or disk images), as it would be difficult to access information in a coherent manner.
In the most recent paper they published, basic concepts of RAID and file systems have been presented.
The simulations required the use of a problem situation, a test environment and a proposed technique for performing the test.
THEORICAL FRAMEWORK
RAID fixes
RAID (Redundant Array of Independent Disks) is a technology that allows you to combine
multiple storage devices and basically merges them into a single, consolidated disk for all intents and purposes. RAID establishes a synergy between the devices that make up the array,
providing the following advantages:
? Performance: the joint operation of the multiple devices enables the use of parallel read and write operations, which would not be possible with a single device.
? Speed: a higher transfer rate is achieved by distributing the data across multiple devices.
? I/O operations per second: when you can parallelize access to the devices.
plots on different disks more operations can be responded to
on the disks.
? Fault tolerance: RAID, in some of its modes of operation, allows for
redundancy in the data. In these cases the failure of a disk does not compromise the
information, but the performance is degraded to replace the device and
restore the arrangement.
? Capacity: as a result of combining the devices, a device is obtained.
equal to or larger than each of the individual devices.
? Cost-effectiveness: these characteristics are obtained from combining real disks with a relatively low cost. If you were looking for a real unique device with the same
features that a disk array, if it existed, then it would probably be a lot
more expensive. Therefore, RAID arrays have a much lower cost per data unit of storage.
The RAID configuration used will determine the extent of which the data is reinforced.
As this followup article we will write will show, there are some inefficient configurations that sacrifice the capacity
of storage for redundancy, or vice versa (sacrifice redundancy for greater capacity
of storage). There are also configurations that establish a balance between capacity
and redundancy, but they come with a tradeoff of lower performance.
The configuration metadata of the disks that make up a RAID array are stored in
a structure called superblock RAID, or DDF structure according to the nomenclature of the
SNIA. This structure stores the relevant information to determine which arrangement and
virtual drive belongs to each disk, as well as the configuration parameters, parity type,
band size, cache and other factors.