Topic: A Challenge

I am after a sorting algorithm written in perl to perform the following task:

I have a variable number of files within a directory (anything from 0 to a couple of hundred) They are named very specifically in the following manner:

document1_rev-0
document2_rev-0
document2_rev-A
document2_rev-B
document3_rev-0
document4_rev-0
document4_rev-A

The documentx part of the file could be anything, the _rev-x part always starts at _rev-0 (zero) and each time the document is revised will change to A through to Z (it is be highly unlikley to ever reach Z). I want the algorithm to return a list showing only the latest revisions. ie the list above should appear as:

document1_rev-0
document2_rev-B
document3_rev-0
document4_rev-A

There are two ways this can be acheived, either keep a dynamic list updated as each file is read from disk in turn, or read all files into an array (or something similar) and then run the algorithm on the array to strip out old revisions. (Due to the nature/use of the files I do not want to use a database to perform the above).

I will pay the princely sum of £10 (In sterling or Amazon book tokens, should anyone not wish to disclose an address for it to be sent to) for the solution which I beleive is most efficient (this would be acheived by trial and error as I probably wouldn't be capable of determining it any other way. My decision would be final and not up for debate, Although I would expect if other postings are anything to go by a debate would rage on afterwards to a point where you will be counting clock cycles and how much further one particular electron had travelled around it silicon orbit)

On a serious note I look forward to receiving some feedback

Paul M

2

Re: A Challenge

Hmmm why not keeping a small db where you keep the name and revision in a sepparate field. add records after generatio or upload, that's pretty easy to do.

To read the latest revision filenames, it'll be something like
select name_field, max(revision_field) as revision_field from document_database group by name_field
(or avoid max and group by, if you want them all), you then compose the real filename as name_field."_rev-".revision_field

As an addition, you could serve the files directly to the requester, avoiding the need to give a direct link to the real file. If you need details, ask.

Now, my VAT number is... wink

Re: A Challenge

MarcB

It is to do with the way the site is updated and the type of files, The files are AutoCAD dwf files which are viewed through the browser using a custom plug-in. the plug-in can only reference and view the files if they are served up following strict rules.

Paul M

4

Re: A Challenge

Hmmmm not sure if the plug-in is custom, might not work. What I have done is to serve files that the user already has a program to view/interact with. So to say, pdf files if you have an acrobat viewer. That works great cause if they don't, they receive the pdf file which they can save and view later on. Of course, if these files should be propietary (and thus not really transferrable to be saved), you've got far more problems than I do wink

Anyway, the solution proposed should work fine, just modify slightly the act of submitting a file: instead of copy&paste to the server, or ftp it, use an upload php page and update the page at the same time.

You can even add ease of use with 'add new doc' (and then ask for a name) or 'update existing doc' (and then use all the fields you already have, including description etc...). And of course, filtering the file types and everything. In the end, you get much more control.

So, I hope this helps, if you don't see it as a solution, please re-state the details (somehow differently, that is wink ), and if you're using something else to work that out, please let us know. I am always available to learn new ways to do things smile

Marc

Re: A Challenge

I have written some code myself that works fine, can anyone suggest any way of making it more efficient?

#define arrays
   @dwfs1 = ();
   @dwfs2 = ();
   @dwfs3 = ();
   
#read contents of directory into first array
   while ($_ = readdir(DH)){
      push(@dwfs1, $_);
   }
   close(DH);

#create a second array containing the old revisions
   foreach $item1(@dwfs1){
      @dwf1 = split(/_rev-/, $item1);
      $popped = 0;
      foreach $item2(@dwfs1){
         @dwf2 = split(/_rev-/, $item2);
         if ($dwf1[0] eq $dwf2[0] && $dwf1[1] lt $dwf2[1] && $popped == 0) {
            push(@dwfs2, $item1);
            $popped = 1;
         }
      }
   }
   
#store items unique (ie latest revs) in third array
   foreach $item1(@dwfs1){
      $match = 0;
      foreach $item2(@dwfs2){
         if($item1 eq $item2){
            $match = 1;
         }
      }
      if ($match == 0){
         push(@dwfs3, $item1);
      }
   }

PM