Tuesday, September 30, 2014

How to Resize a File in uniVerse


How to Resize a File in uniVerse

MEMO file used in this example

    Resizing is a function which can be performed by any user with access to an account. Since it's not something a normal user would stumble over without some searching and looking, this shouldn't be a problem. However, resizing is something which should be understood and performed with no one on the account, or at least a certainty that no one is accessing the file which is being resized. In newer releases of uniVerse, a resize can't be done if the file is open by any process wether background or user initiated. So, here's the steps to follow assuming the previous issues are known and controlled.

  1. ANALYZE.FILE MEMO

    (from TCL)
  2. This will return data which is significant in determining the new size. Here's a sample of what may come back:

    >ANALYZE.FILE MEMO
    File name                               = MEMO
    File type                               = 18
    Number of groups in file (modulo)       = 1009
    Separation                              = 1
    Number of records                       = 10125
    Number of physical bytes                = 46554112
    Number of data bytes                    = 37423452
    
    Average number of records per group     = 10.0347
    Average number of bytes per group       = 37089.6452
    Minimum number of records in a group    = 5
    Maximum number of records in a group    = 20
    
    Average number of bytes per record      = 3696.1434
    Minimum number of bytes in a record     = 20
    Maximum number of bytes in a record     = 427496
    
    Average number of fields per record     = 76.8433
    Minimum number of fields per record     = 1
    Maximum number of fields per record     = 7733
    
    Groups  25%    50%    75%   100%   125%   150%   175%   200% full
              0      0      0      0      0      0      0   1009 
    
    
    The above example gives us all the information we need to resize the file and make it usable and fast for now. The only missing information, which the client will have to provide and should know, is how much growth they expect over a specified period of time. That time period is measured by how soon they expect to resize the file the next time.

  3. Analyze the data returned by the ANALYZE.FILE command

  4. The data of interest is:

    1. File type: 18
      
      
    2. which is fine for this file. For files which are wholly numeric, other file types can be used, however 18 has proven to be very fast and stable. [Note: A discussion of using Dynamic Files (type 30) is not included in this document.]

    3. Modulo: 1009
      
      
    4. which will be used for reference to see how badly sized the file is.

    5. Separation: 1
      
      
    6. The separation should be based on the hardware/OS use of blocksizes at the disk level. On HP, this is usually 2048-bytes per block. IBM uses 1024-byte blocks. So, depending on your hardware, you need to adjust this accordingly. The block size for the separation is 512 bytes. A separation of 2 will equal blocksizes in uniVerse of 1024, 4 = 2048, 8 = 4096, etc. You can use any separation you wish, however, it is recommended that you stay within the recommended sizes based on hardware constraints.
      [Note: If you are on a Microsoft platform, use a separation of 4, which has proven to be very stable, as a base-line for resizing unless your record size mandates a larger separation.]

    7. Number of data bytes: 37423452
      
      
    8. This number, when expected growth is included, is the most important. It is used to calculate the new filesize.

    9. Average number of bytes per record: 3696.1434
      
      
    10. This number will be used to determine if a separation larger than the current separation is needed.

      [Note: On Microsoft platforms, this can still be modified if desired, but be sure to use a multiple of 2. I.e.: 2, 4, 8, 16, 32. values larger than 16 become difficult to maintain and are very inefficient where disk usage is a concern.]

    11. Minimum number of bytes in a record: 20
    12. This number, in conjunction with the previous number, will be used to determine the optimum separation.

    13. Maximum number of bytes in a record: 427496
      
      
    14. With the previous 2, this is used to determine the separation. Whether this number is used is determined by how close to the average it is. Usually this number is so much higher than the average that it's considered too disproportionate to be considered. If the average and this number are relatively close, then it carries more weight.

    Now for the analysis:


    Note: Too much analysis may become counter-productive. In many cases it's more important just to get the resize done, rather than worrying about what separation to use. If you feel this is your case, just use a separation of 4 when the average sized item is smaller and 8 for those that are larger on average. Also use a filetype of 18, which seems to be very efficient, unless a case can be made to use dynamic files (type 30). If you choose to take this route, you may not need to pay much attention to the next section, however, you accept all responsibility.

      A. Find the best SEPARATION. Use the Average from e. above as the rule, but consider how close it is to the minimum and maximum in f. and g. In this case, it's closer to the minimum than the maximum, so we can surmise that the maximum is an exception rather than the rule. The rule of thumb for determining the separation is to fit somewhere between 3 and 10 records per group. At 3700 bytes per record, we can determine that the separation might best be 16, which makes each of the primary buffers in the file (blocksize) 8192 bytes. This is not unusual for the MEMO file. If you follow this process on the FISCAL file, you will find different results.

      B. Calculate the correct MODULO based on the SEPARATION from step A. above. For this number, divide the Number of data bytes (37423452) by the SEPARATION (16 * 512, or 8192). For this example, that would equal 4568.29.

      C. Add the growth percentage the client provides. This example assumes 20 percent growth between now and the next resize in say, six months. So, multiply 4568.29 by 1.2. No, wait. I usually consider the oversized records and overflow area which the large records will require, so I typically add about 10% to the filesize myself. So, take 4568.29 and multiply it by 1.30 which will give you a modulo of 5938.777, or 5939.
      NOTE: If you are going to add a large amount of data to a file that is currently sized appropriately, you will need to calculate the percentage increase of the new data in relation to the existing data. After that, calculate the modulo accordingly. I.e.: if you are adding 5,000 records to the file in our example, which holds 10,125 records, multiply 4568.29 by 1.6, which will increase the size of the file by half, plus a 10% growth allowance.

      D. Calculate the PRIME number. At TCL, type PRIME 5939. You will get the following response:

      >PRIME 5939
      Next lower prime number: 5939.
      Next higher prime number: 5939.
      

      Now that isn't going to happen very often (having the number you calculated be a prime number). Okay, back to work... We now have the sizes to use when we perform the RESIZE command.

  5. RESIZE the file.
  6. But first, you must determine if there's enough disk space in the filesystem or partition where the current file resides. When you resize, uniVerse will create a temporary file that starts with resize..., when I ran it for this example, the name resize9e6151 (obviously the name of yours will have different numbers or letters after the word resize). That file will be the new file when the process is completed, so you must have room for both the new file and the old file to be successful. Here's the commands you would use if you have enough room:

    >RESIZE MEMO * 5939 16

    where the *' tells the process to keep the current parameter. In this case, that's the filetype. In our example, that's a type 18, which is fine.
    If you don't have enough disk space on the filesystem or partition, but do on another filesystem or partition, here's the command you would use:

    >RESIZE MEMO * 5939 16 USING /u5
    
    
    You must have checked and know that /u5 has plenty of space for the new file.

    NOTE: The USING switch on Microsoft platforms would have D:\directory, rather than /u5.

    I.e.:
          RESIZE MEMO * 5939 16 USING D:\WINNT\TEMP\


    The process will create the new file, either in the same directory as MEMO or the directory specified after the USING keyword. After the file is created, the process begins copying data from the old to the resizeNNN file. It does this by copying an item to the new file from the old file. When each item is copied, a flag is set in the old file so the process knows where to go next. The groups are moved through in sequential order setting these resize flags, until all the records are copied. Then the old file is deleted and the new file is moved into place, being renamed from resize9e6151, in this sample, to MEMO.

    NOTE: If you run the resize as a superuser or administrator, you must check the permissions to verify that the new file has read/write permissions set appropriately for the users that will need access. This is especially true on Hewlett Packard (HP/UX) Unix systems.

    Depending on how large the file is, what the new file type is, if that's being changed, or what the separation is, this process could take a very long time. That's one reason I recommend keeping files to a manageable size. It might become necessary to distribute files to make that possible.

    Should you need to interrupt the resize for any reason, including those out of your control, the following command must be run on the file before it can be used:

    filepeek MEMO

    which will give some information about the file, then prompt with
    Addr:

    where you will type RCL to reset the resize bits. It will prompt you to enter Y to continue or N to stop. Enter Y.

    You must be superuser to execute filepeek. Do not do anything else in this program because it alters data at the disk level and can't be undone. Fixes resulting in misuse of this program will be billable at the current emergency coverage costs. Please don't abort the resize unless absolutely necessary. You will probably lose an item or two if you do. If the process is aborted by a power outage, you can expect multiple corruptions in the file and perhaps the loss of many records.

  7. Go home and eat dinner.
  8. You're finished.

No comments:

Post a Comment