PPIO

PPIO is a library for efficiently doing large block I/O. The API was specifically designed to tighten the gap between implementors of distributed I/O software systems and HPC developers that require high performance I/O. It encourages a method of organizing I/O routines which is inclusive instead of divisive with the application program's main processing routines.

Note that the API is very low level, operating at the level of byte streams. It does not include any of the higher level facilities that NetCDF or HDF5 provide and inquiries to that regard are missing the point. The experiment here is in investigating what we should be using in place of open(2) and close(2), not (e.g.) metadata storage and access.

As such, the intended audience is currently limited to those doing research. One should consider the specification malleable, and feel encouraged to modify it.

Download

You can download the current version from here.

If you'd like to actually compile it with the included makefile, you'll need the check library which is utilized for the library's unit tests. For normal use, we recommend you just copy the files into your source tree, for now.

Documentation

At present the best form of documentation is the PPIO specification. However, it is admittedly very terse and written in standardese (by design). Another source for how the library should be used is the test.c file, included in the tarball; it includes tests to ensure the library is operating correctly. All that said, a 5-second intro to the library would be:

Open a file via open_range.
Create a list of all the byte ranges you will need in the near future, and stuff that into an array of ppio_iovec_ts.
Move the byte range you need right now to the first element of that array, and then pass it to readonev.
Process your data as normal.
Knock every element in the ppio_iovec_t down a notch, deleting the first element of the array.
Repeat.
Close the file via close_range.

To make things a little more explicit, here's the sketch of a simple function which performs a thresholding operation on a 64³ floating point volume.


  const size_t dims[3] = { 64, 64, 64 };
  size_t total_length = dims[0]*dims[1]*dims[2] * sizeof(float);

  void* map_in = open_range("input_vol", PPIO_RDONLY, 0, total_length);
  if(map_in == NULL) {
    fprintf(stderr, "error (%d) occurred accessing input data.\n", errno);
    return;
  }

  /* we're only outputting an 8bit volume, so divide by sizeof(float). */
  void* map_out = open_range("output_vol", PPIO_WRONLY, 0,
                             total_length/sizeof(float));
  if(map_out == NULL) {
    fprintf(stderr, "error (%d) occurred accessing output data.\n", errno);
    close_range(map_in);
    return
  }

  /* We assume the file is a linear array, with X varying slowest.
   * We'll access it in 4 equi-sized chunks for explanatory purposes.  In a real
   * application, your data would be bricked and each brick would be a chunk.
   * Furthermore, you would want each brick to be much larger, say 256^3. */
  struct ppio_iovec_t ibricks[4];
  struct ppio_iovec_t obricks[4]; /* need them for the output bricks too! */

  const size_t n_chunks = 4;
  const size_t slowest_dim_size = dims[0] / n_chunks;
  const size_t n_elems = dims[2]*dims[1]*slowest_dim_size; /* per brick */

  ibricks[0].length = ibricks[1].length = ibricks[2].length = ibricks[3].length
    = n_elems * sizeof(float);

  ibricks[0].offset = 0 * (n_elems * sizeof(float));
  ibricks[1].offset = 1 * (n_elems * sizeof(float));
  ibricks[2].offset = 2 * (n_elems * sizeof(float));
  ibricks[3].offset = 3 * (n_elems * sizeof(float));

  obricks[0].legnth = obricks[1].length = obricks[2].length = obricks[3].length 
    = n_elems * sizeof(uint8_t);

  obricks[0].offset = 0 * n_elems;
  obricks[1].offset = 1 * n_elems;
  obricks[2].offset = 2 * n_elems;
  obricks[3].offset = 3 * n_elems;

  /* now we've configured all our iovecs, let's process each brick. */
  size_t len = 4; /* number of iovec's in our arrays. */
  for(size_t i=0; i < 4 /* num bricks */; ++i) {
    /* get the data */
    const float* in = (const float*) readonev(map_in, ibricks, len);
    uint8_t* out = (uint8_t*) readonev(map_out, obricks, len);

    if(in == NULL || out == NULL) {
      fprintf(stderr, "severe error processing bricks...\n");
      exit(EXIT_FAILURE);
    }

    /* process each element of the data.  We're just trying to output a binary
     * volume of 1's and 0's. */
    for(size_t j=0; j < n_elems; ++i) {
      *out = (19 < *in && *in < 42) ? 1 : 0;
      ++in;
      ++out;
    }

    /* Fix our iovec arrays for the next round; delete the first element, we
     * already processed it. */
    for(size_t k=0; k < len-1; ++k) {
      memcpy(&ibrick[k+0], &ibrick[k+1], sizeof(struct ppio_iovec_t));
      memcpy(&obrick[k+0], &obrick[k+1], sizeof(struct ppio_iovec_t));
    }

    /* We have one less element to process. */
    --len;
  }

  /* Tell the I/O system we're done. */
  finished(map_in);
  finished(map_out);

  /* Preferably we wouldn't do this right here, but wait until much later.  For
   * example, if all of the above was a function we called once per resolution
   * in a multiresolution data set, we might close_range on resolution n-1 here,
   * if we consider the resolution we *just* processed to be resolution 'n'.
   * For an application which does not need to open many files, we might
   * consider doing this from a registered atexit() call. */
  close_range(map_in);
  close_range(map_out);