EzDev.org

storage interview questions

Top 15 storage interview questions

6368 Jobs openings for storage


What kind of storage do people actually use for VMware ESX servers?

VMware and many network evangelists try to tell you that sophisticated (=expensive) fiber SANs are the "only" storage option for VMware ESX and ESXi servers. Well, yes, of course. Using a SAN is fast, reliable and makes vMotion possible. Great. But: Can all ESX/ESXi users really afford SANs?

My theory is that less than 20% of all VMware ESX installations on this planet actually use fiber or iSCS SANs. Most of these installation will be in larger companies who can afford this. I would predict that most VMware installations use "attached storage" (vmdks are stored on disks inside the server). Most of them run in SMEs and there are so many of them!

We run two ESX 3.5 servers with attached storage and two ESX 4 servers with an iSCS san. And the "real live difference" between both is barely notable :-)

Do you know of any official statistics for this question? What do you use as your storage medium?


Source: (StackOverflow)

ZFS Data Loss Scenarios

I'm looking toward building a largish ZFS Pool (150TB+), and I'd like to hear people experiences about data loss scenarios due to failed hardware, in particular, distinguishing between instances where just some data is lost vs. the whole filesystem (of if there even is such a distinction in ZFS).

For example: let's say a vdev is lost due to a failure like an external drive enclosure losing power, or a controller card failing. From what I've read the pool should go into a faulted mode, but if the vdev is returned the pool should recover? or not? or if the vdev is partially damaged, does one lose the whole pool, some files, etc.?

What happens if a ZIL device fails? Or just one of several ZILs?

Truly any and all anecdotes or hypothetical scenarios backed by deep technical knowledge are appreciated!

Thanks!

Update:

We're doing this on the cheap since we are a small business (9 people or so) but we generate a fair amount of imaging data.

The data is mostly smallish files, by my count about 500k files per TB.

The data is important but not uber-critical. We are planning to use the ZFS pool to mirror 48TB "live" data array (in use for 3 years or so), and use the the rest of the storage for 'archived' data.

The pool will be shared using NFS.

The rack is supposedly on a building backup generator line, and we have two APC UPSes capable of powering the rack at full load for 5 mins or so.


Source: (StackOverflow)

Do I need to RAID Fusion-io cards?

Can I run reliably with a single Fusion-io card installed in a server, or do I need to deploy two cards in a software RAID setup?

Fusion-io isn't very clear (almost misleading) on the topic when reviewing their marketing materials Given the cost of the cards, I'm curious how other engineers deploy them in real-world scenarios.

I plan to use the HP-branded Fusion-io ioDrive2 1.2TB card for a proprietary standalone database solution running on Linux. This is a single server setup with no real high-availability option. There is asynchronous replication with a 10-minute RPO that mirrors transaction logs to a second physical server.

Traditionally, I would specify a high-end HP ProLiant server with the top CPU stepping for this application. I need to go to SSD, and I'm able to acquire Fusion-io at a lower price than enterprise SAS SSD for the required capacity.

  • Do I need to run two ioDrive2 cards and join them with software RAID (md or ZFS), or is that unnecessary?
  • Should I be concerned about Fusion-io failure any more than I'd be concerned about a RAID controller failure or a motherboard failure?
  • System administrators like RAID. Does this require a different mindset, given the different interface and on-card wear-leveling/error-correction available in this form-factor?
  • What IS the failure rate of these devices?

Edit: I just read a Fusion-io reliability whitepaper from Dell, and the takeaway seems to be "Fusion-io cards have lots of internal redundancies... Don't worry about RAID!!".


Source: (StackOverflow)

Linux - real-world hardware RAID controller tuning (scsi and cciss)

Most of the Linux systems I manage feature hardware RAID controllers (mostly HP Smart Array). They're all running RHEL or CentOS.

I'm looking for real-world tunables to help optimize performance for setups that incorporate hardware RAID controllers with SAS disks (Smart Array, Perc, LSI, etc.) and battery-backed or flash-backed cache. Assume RAID 1+0 and multiple spindles (4+ disks).

I spend a considerable amount of time tuning Linux network settings for low-latency and financial trading applications. But many of those options are well-documented (changing send/receive buffers, modifying TCP window settings, etc.). What are engineers doing on the storage side?

Historically, I've made changes to the I/O scheduling elevator, recently opting for the deadline and noop schedulers to improve performance within my applications. As RHEL versions have progressed, I've also noticed that the compiled-in defaults for SCSI and CCISS block devices have changed as well. This has had an impact on the recommended storage subsystem settings over time. However, it's been awhile since I've seen any clear recommendations. And I know that the OS defaults aren't optimal. For example, it seems that the default read-ahead buffer of 128kb is extremely small for a deployment on server-class hardware.

The following articles explore the performance impact of changing read-ahead cache and nr_requests values on the block queues.

http://zackreed.me/articles/54-hp-smart-array-p410-controller-tuning
http://www.overclock.net/t/515068/tuning-a-hp-smart-array-p400-with-linux-why-tuning-really-matters
http://yoshinorimatsunobu.blogspot.com/2009/04/linux-io-scheduler-queue-size-and.html

For example, these are suggested changes for an HP Smart Array RAID controller:

echo "noop" > /sys/block/cciss\!c0d0/queue/scheduler 
blockdev --setra 65536 /dev/cciss/c0d0
echo 512 > /sys/block/cciss\!c0d0/queue/nr_requests
echo 2048 > /sys/block/cciss\!c0d0/queue/read_ahead_kb

What else can be reliably tuned to improve storage performance?
I'm specifically looking for sysctl and sysfs options in production scenarios.


Source: (StackOverflow)

Storing and backing up 10 million files on Linux

I run a website where about 10 million files (book covers) are stored in 3 levels of subdirectories, ranging [0-f]:

0/0/0/
0/0/1/
...
f/f/f/

This leads to around 2400 files per directory, which is very fast when we need to retrieve one file. This is moreover a practice suggested by many questions.

However, when I need to backup these files, it takes many days just to browse the 4k directories holding 10m files.

So I'm wondering if I could store these files in a container (or in 4k containers), which would each act exactly like a filesystem (some kind of mounted ext3/4 container?). I guess this would be almost as efficient as accessing directly a file in the filesystem, and this would have the great advantage of being copied to another server very efficiently.

Any suggestion on how to do this best? Or any viable alternative (noSQL, ...) ?


Source: (StackOverflow)

How to move files between two S3 buckets with minimum cost?

I have millions of files in a Amazon S3 bucket and I'd like to move these files to other buckets and folders with minimum cost or no cost if possible. All buckets are in the same zone.

How could I do it?


Source: (StackOverflow)

Tuning iSCSI storage

This is a Canonical Question about iSCSI we can use as a reference.

iSCSI is a protocol that puts SCSI commands as payload into TCP network packets. As such, it is subject to a different set of problems than, say, Fibre Channel. For example, if a link gets congested and the switch's buffers are full, Ethernet will, by default, drop frames instead of telling the host to slow down. This leads to retransmissions which leads to high latency for a very small portion of storage traffic.

There are solutions for this problem, depending on the client operating system, including modifying network settings. For the following list of OSs, what would an optimal iSCSI client configuration look like? Would it involve changing settings on the switches? What about the storage?

  • VMWare 4 and 5
  • Windows Hyper-V 2008 & 2008r2
  • Windows 2003 and 2008 on bare metal
  • Linux on bare metal
  • AIX VIO
  • Any other OS you happen to think would be relevant

Source: (StackOverflow)

Why are enterprise SAS disk enclosures seemingly so expensive?

I will begin by stating that I do not believe this is a duplicate of Why is business-class storage so expensive?.

My question is specifically about SAS drive enclosures, and justifying their expense.

Examples of the types of enclosures I'm referring to are:

  • 1 HP D2700
  • 2 Dell MD1220
  • IBM EXP3524

Each of the above is a 2U direct attached external SAS drive enclosure, with space for around 24 X 2.5" drives.

I'm talking about the bare enclosure, not the drives. I am aware of the difference between enterprise class hard drives and consumer class.

As an example of "ball-park" prices, the HP D2700 (25 X 2.5" drives) is currently around $1750 without any drives (checked Dec 2012 on Amazon US). A low end HP DL360 server is around $2000, and that contains CPU, RAM, motherboard, SAS RAID controller, networking, and slots for 8 X 2.5" drives.

When presenting clients or management with a breakdown of costs for a proposed server with storage, it seems odd that the enclosure is a significant item, given that it is essentially passive (unless I am mistaken).

My questions are:

  1. Have I misunderstood the components of a SAS drive enclosure? Isn't it just a passive enclosure with a power supply, SAS cabling, and space for lots of drives?

  2. Why is the cost seemingly so expensive, especially when compared to a server. Given all the components that an enclosure does not have (motherboard, CPU, RAM, networking, video) I would expect an enclosure to be significantly less expensive.

Currently our strategy when making server recommendations to our clients is to avoid recommending an external drive enclosure because of the price of the enclosures. However, assuming one cannot physically fit enough drives into the base server, and the client does not have a SAN or NAS available, then an enclosure is a sensible option. It would be nice to be able to explain to the client why the enclosure costs as much as it does.


Source: (StackOverflow)

Best way to test new HDD's for a cheap storage server

I want to build a storage server and bought 10 x 2TB WD RED's. The HDD's just arrived.

Is there any tool you guys use to check for bad drives or to best defend against infant mortality before copying real data on to your disks?

Is it better to check each single HDD or to test the array (ZFS raid-z2) through copying a lot of data on it?

Thank you for your advice in advance!


Source: (StackOverflow)

What is a Storage Area Network, and which benefits does it have over different storage solutions?

I'm proposing this to be a canonical question about enterprise-level Storage Area Networks.

What is a Storage Area Network (SAN), and how does it work?
How is it different from a Network Attached Storage (NAS)?
What are the use cases compared to direct-attached storage (DAS)?
In which way is it better or worse?
Why is it so expensive?
Should I (or my company) use one?


Source: (StackOverflow)

Does an unplugged hard drive used for data archival deteriorate?

If I were to archive data on a hard drive, unplug it, and set it on a (not dusty, temperature-controlled) shelf somewhere, would that drive deteriorate much?

How does the data retention of an unplugged hard drive compare to tapes?


Source: (StackOverflow)

What's the best way to explain storage issues to developers and other users

When server storage gets low developers all start to moan, "I can get a 1 TB drive at Walmart for 100 bucks, what's the problem".

How can the complexities of storage be explained to developers so that they will understand why a 1 TB drive from Walmart just won't work.

p.s. I'm a developer and want to know too: )


Source: (StackOverflow)

What is the difference between SAN, NAS and DAS?

What is the difference between SAN, NAS and DAS?


Source: (StackOverflow)

Linux on VMware - why use partitioning?

When installing Linux VMs in a virtualized environment (ESXi in my case), are there any compelling reasons to partition the disks (when using ext4) rather than just adding separate disks for each mount point?

The only one I can see is that it makes it somewhat easier to see if there's data present on a disk with e.g. fdisk.

On the other hand, I can see some good reasons for not using partitions (for other than /boot, obviously).

  • Much easier to extend disks. It's just to increase disk size for the VM (typically in VCenter), then rescan the device in VM, and resize the file system online.
  • No more issues with aligning partitions with underlying LUNs.

I have not found much on this topic around. Have I missed something important?


Source: (StackOverflow)

Is it safe to use consumer MLC SSDs in a server?

We (and by we I mean Jeff) are looking into the possibility of using Consumer MLC SSD disks in our backup data center.

We want to try to keep costs down and usable space up - so the Intel X25-E's are pretty much out at about 700$ each and 64GB of capacity.

What we are thinking of doing is to buy some of the lower end SSD's that offer more capacity at a lower price point. My boss doesn't think spending about 5k for disks in servers running out of the backup data center is worth the investment.

These drives would be used in a 6 drive RAID array on a Lenovo RD120. The RAID controller is an Adaptec 8k (rebranded Lenovo).

Just how dangerous of an approach is this and what can be done to mitigate these dangers?


Source: (StackOverflow)