About This Document

Rainfall Datasets

Data Formats and Processing

Tools

 

 

 

About This Document

 

This document is meant to provide a concise overview of my work to date on developing the Dhofar Geographic Information System (GIS) and preparing its underlying datasets, namely rainfall data acquired from NASA’s Tropical Rainfall Measurement Mission (TRMM) initiative (background information about the project is available here). The document is structured into categorical guidelines invaluable to whoever is to use and/or further develop the system.

 

After listing the datasets in use (description, source, and specification), data processing guidelines and information on the formats, tools, and methodologies used is provided. Finally, the GIS system itself (views, theme grouping, directory structure, and usage tips) is introduced as implemented in its first version of May 2003.

 

All http hyperlinks provided herein are functional as of the date of writing; all implicit hyperlinks link to local pages on the Dhofar External Hard Drive (HD, hereinafter) and are functional provided the file structure and integrity of the storage media are left intact.

 

The user is recommended to contact me (Loai@Loai-Naamani.com) for any additional information.

 

(Please note that a similar document detailing the processing of vegetation and elevation data is provided by Hongfei Tian.)

 

 

 

Rainfall Datasets

 

A listing of all TRMM products has been compiled into a spreadsheet to be found in here.

Extensive documentation of all data products can be accessed at: http://trmm.gsfc.nasa.gov/data_dir/ProductStatus.html

All dataset downloads can be scheduled and managed at: http://lake.nascom.nasa.gov/data/dataset/TRMM/

Instructions on visualizing the data in its raw format using TRMM tool(s) is available at:

http://daac.gsfc.nasa.gov/CAMPAIGN_DOCS/hydrology/hd_software.shtml

(The TDSIS Orbit Viewer has been tested and is recommended.)

 

There are 4 datasets incorporated into the Dhofar GIS Version 1.0 (GIS, hereinafter). Data of interest falls in the timeframe of 1998-2002 and months of May-September:

 

·        3B42 (Calibrated geosynchronous IR rain rate using TRMM estimates; daily; 1x1 degrees)

 

The purpose of Algorithm 3B-42 is to produce Tropical Rainfall Measuring Mission (TRMM)-adjusted merged-infrared (IR) precipitation and root-mean-square (RMS) precipitation-error estimates.  These gridded estimates are on a 1-day temporal resolution and a 1-degree by 1-degree spatial resolution in a global belt extending from 40 degrees south to 40 degrees north latitude.

 

To do this, Algorithm 3B-42 computes the monthly IR calibration parameters. 3B-42 processes the VIRS data (1B-01) and the TMI data (2A-12) one granule

(orbit) at a time.  The scan data for each corresponding VIRS and TMI granule are accumulated and gridded over the entire orbit (or partial orbit if a month boundary is encountered).  After the partial or complete orbit is gridded and accumulated, 3B-42 computes the granule average of the accumulated orbit of VIRS and TMI precipitation data, then clips the orbit-average VIRS and TMI data to coincident observations, and accumulates these averaged, clipped observations.  After a calendar month of orbit averages of clipped VIRS and clipped TMI precipitation has been accumulated, the monthly averages of these data are computed.  The monthly average clipped TMI data is then converted to clipped TCI data using the TMI/TCI calibration parameters from Product 3B-31. Using the monthly average clipped VIRS data and the monthly average clipped TCI data, the IR calibration parameters are computed.  These IR calibration parameters are then applied to the merged-IR data (3A-44) to produce the TRMM-adjusted merged-IR precipitation.

 

A complete description of Algorithm 3B-42 is provided in the Algorithm 3B-42 User's Guide available from the TRMM Data and Information System (TSDIS).”

 

All raw world data has been downloaded for the time frame of 1998-2002 for months May-September (boundaries included) and is provided on the HD in its raw format, along with the relevant processed derivatives. (Refer to the Data Formats and Processing section of this document for further information on the specs of such derivatives.)

 

 

·        3B43 (Merged rain rate from TRMM, geosynchronous IR, SSM/I, rain gauges; monthly; 1x1 degrees)

 

The purpose of Algorithm 3B-43 is to produce the "Tropical Rainfall Measuring Mission (TRMM) and Other Data" best-estimate precipitation rate and root-mean-square (RMS) precipitation-error estimates.  These gridded estimates are on a calendar month temporal resolution and a 1-degree by 1-degree spatial resolution global band extending from 40 degrees south to 40 degrees north latitude.

 

Algorithm 3B-43 is executed once per calendar month to produce the single, best-estimate precipitation rate and RMS precipitation-error estimate field
(3B-43) by combining two independent precipitation fields.  These two independent precipitation fields are the daily-average adjusted merged-infrared (IR) estimates (3B-42) and the monthly accumulated Climate Assessment and Monitoring System (CAMS) or Global Precipitation Climatology Centre (GPCC) rain gauge analysis (3A-45).

 

The input rain guage data are on the calendar month temporal resolution.  To obtain the requisite calendar month average of adjusted merged-IR data, 3B-43 averages the daily adjusted merged-IR data that span the calendar month of interest.  After this preprocessing is complete, the two independent precipitation fields are merged together to form the best-estimate precipitation rate and RMS precipitation-error estimates.

 

A complete description of Algorithm 3B-43 is provided in the Algorithm 3B-43 User's Guide available from the TRMM Data and Information System (TSDIS).

 

All raw world data has been downloaded for the time frame of 1998-2002 for months May-September (boundaries included) and is provided on the HD in its raw format, along with the relevant processed derivatives. (Refer to the Data Formats and Processing section of this document for further information on the specs of such derivatives.)

 

 

·        2A12 (TMI, TRMM Microwave Images; Hydrometeor – cloud, liquid water, precipitation water, cloud ice, precipitation ice – profiles in 14 layers at 21km horizontal resolution along with latent heat and surface rain, over a 760 km swath)

 

2A12 provides rainfall rates and the vertical structure of hydrometeors and latent heating based upon the nine channels of the TRMM microwave imager (TMI).  For a complete description of the TMI, the user should refer to Kummerow et al., (1998).

 

The algorithm is based upon a Bayesian approach that begins by establishing a large database of potential hydrometeor profiles and their computed brightness temperatures (Tb).  This database is computed from cloud resolving models such as the Goddard Cumulus Ensemble model.  Once the database is established, the retrieval searches the database and in Bayes's formulation, the probability of a particular profile R, given Tb can be written as:

                                           Pr( R | Tb )  =  Pr(R) x Pr(Tb | R)

where Pr(R) is the probability with which a certain profile R will be observed and Pr(Tb | R) is the probability of observing the brightness temperature vector, Tb, given a particular rain profile R.  The probability that a profile R will be observed is taken from the cloud profile database.  The second term is specified in the Bayesian formulation to be the gaussian weight which depends upon the RMS difference between observed and computed Tb.  A more complete description of this portion of the algorithm can be found in Kummerow et al., (1996).  Results on the latent heating retrievals from this algorithm can be found in Olson et al., (1998).

 

The algorithm implemented at TSDIS has the further requirement that the Convective/ stratiform fraction of precipitation in the satellite field of view also match that given by the cloud model.  The C/S fraction is computed from the horizontal texture of the Tb field.  This technique is described by Hong et al., (1998).  A manuscript with the complete details of the algorithm is in preparation. The purpose of Algorithm 3B-43 is to produce the "Tropical Rainfall Measuring Mission (TRMM) and Other Data" best-estimate precipitation rate and root-mean-square (RMS) precipitation-error estimates.  These gridded estimates are on a calendar month temporal resolution and a 1-degree by 1-degree spatial resolution global band extending from 40 degrees south to 40 degrees north latitude.

 

 

·        2A25 (PR, 13.8 GHz – rain rate, reflectivity, attenuation, profiles at 250m vertical and 4km horizontal resolutions, over a 220 km swath)

 

2A25 basically uses a hybrid of the Hitschfeld-Bordan method and the surface reference method to estimate the vertical true radar reflectivity (Z) profile.  (The hybrid method is described in Iguchi and Meneghini (1994)).  The vertical rain profile is then calculated from the estimated true Z profile by using an appropriate Z-R relationship.

 

The attenuation correction is, in principle, based on the surface reference method. This method assumes that the decrease in the apparent surface cross section (Dso) is caused by the propagation loss in rain. The coefficient a in the k-Z relationship k=a Zb is adjusted in such a way that the path-integrated attenuation (PIA) estimated from the measured Zm-profile will match the reduction of the apparent surface cross section.  The attenuation correction of Z is carried out by the Hitschfeld-Bordan method with the modified a.  Since a is adjusted, we call this type of surface reference method the a-adjustment method. The a-adjustment method assumes that the discrepancy between the PIA estimate from Dso and that from the measured Zm-profile can be attributed to the inappropriate choice of a values which may vary depending on the raindrop size distribution and other conditions.  It assumes that the radar is properly calibrated and that the measured Zm has no error.

 

In order to avoid inaccuracies in the attenuation correction when rain is weak, a hybrid of the surface reference method and the Hitschfeld-Bordan method is used [Iguchi and Meneghini, 1994].  The PIA is first estimated from the precipitation echo alone.  The weight given by the hybrid method to the PIA estimate from the surface reference increases as the attenuation estimate increases.  When rain is very weak and the attenuation estimate is small, the PIA estimate from the surface reference is effectively neglected.  With the introduction of the hybrid method, the divergence associated with the Hitschfeld-Bordan method is also prevented.

 

One major difference from the method described in the above reference is that in order to deal with the beam-filling problem, a non-uniformity parameter is introduced and is used to correct the bias in the surface reference arising from the horizontal non-uniformity of rain field within the beam. Since radar echoes from near the surface are contaminated by the mainlobe clutter, the rain estimate at the lowest point in the clutter-free region is given as the near-surface rainfall rate for each angle bin.

 

 

Both the 2A12 and 2A25 datasets are orbital or swath-based and instantaneous. That is, the likeliness of capturing a useful data instance is proportional to the possibility of the monitoring satellite covering the relatively small 4x4 degree Dhofar region in the midst of substantial rainfall (at intensities exceeding the minimum reflectivity of the radar). Hence, only a total of 28 2A12 (TMI) and 5 2A25 (PR) instances have showed significant rainfall and spatial distribution and have been accordingly processed and incorporated into the GIS. The specific instance dates can be inferred from their respective Views in the GIS or from their containing directories (2A12 and 2A25).

 

 

·        Other downloaded datasets provided in their rawest format on the HD are the GB31 and 3A25 datasets. Those sets have not been analyzed or processed, nor incorporated into the GIS – They are available on the HD just to avoid re-downloading them should they be needed later.

 

 

 

Data Formats and Processing Overview

 

All data processing and GIS development has been performed on a Windows Xp Professional platform/OS. This should justify most of the tools and methodologies adopted and should serve as the ideal or recommended platform on which additional work using the same tools and process flow should take place.

 

All datasets have been downloaded from the TRMM website. The data is provided as compressed in .Z or .TAR format. A tool for unzipping/decompressing such formats is provided in the tools directory (quickzip.exe).

 

The resulting HDF (Hierarchical Data Format) files are TRMM’s native format. This format can be visualized and manipulated using a number of tools described at http://daac.gsfc.nasa.gov/CAMPAIGN_DOCS/hydrology/hd_software.shtml, among many others available on the web.

 

To translate the HDF files into formats readily importable into the GIS, the TRMM Extractor ver.1.0 has been developed. It takes the .HDF file and generates any of the following formats: .txt, .csv, and .asc for all or a subset of the source file. Since the downloaded files are World files or corresponding to a full swatch, TRMM Extractor can make a subset of this data and discard all data outside the Dhofar region. The subset of 2A12 and 2A25 datasets is 52E-56E/16N-20N (4x4 deg. region), and a larger 52E-60E/12N-27N (8x15 deg. region) for the 3B43 and 3B42 datasets due to their low spatial resolutions.

 

The .csv (Comma Separated Value) format is handy for use in spreadsheet programs such as Excel. A set of macros have been developed and embedded into the TRMMAnalyzer.xls spreadsheet for quick visualization, time-series plotting, aggregation, and summary statistics of the processed data. One of the macros (AggregateFields) is also responsible for helping generate the time series files for the different datasets incorporated into the GIS. A file renamer utility (Bulk File Renamer) has been used to change all file names in batch to conform to the 8.3 file naming convention (filenames generated by TRMM Extractor are incompatible with ArcView).

 

The .asc file format is an ArcInfo grid format with header information on the contents of the grid file. It can be imported into ArcView and visualized as a grid. To automate the batch conversion from .asc files into higher-level ArcView grid files and point shape files, a 3rd party tool, GridMachine 5.2, has been used. Upon generating the corresponding grid and shape files, the core backbone for the GIS is ready.

 

Developing the GIS here onwards consists of grouping the data in various ways into Views consisting of different Themes that are, in turn, assigned map Legends. This process, along with the preliminary sample analysis performed in some Views, is further discussed in the Dhofar GIS ver.1.0 section of this document. 

 

 

 

Tools

 

TRMM Extractor ver.1.0

 

 

 

 

TRMM Extractor was primarily developed using C# on the .NET platform to fulfill the functions of: (1) subsetting TRMM data to the Dhofar region, (2) transforming from swath/orbital data (2A12 and 2A25) into adjusted uniform grid data, and (3) transforming data from .HDF format into more user-friendly formats, namely those readily importable into the GIS.

 

This version of TRMM Extractor only processes 2A12, 2A25, 3B42, and 3B43 datasets, namely the SDS variables: ‘cldWater’ (g/m3), ‘surfaceRain’(mm/hr), and ‘geolocation’ in 2A12,  nearSurfRain’ (mm/hr) and ‘geolocation; in 2A25, and ‘percipitate’ (mm/hr) in 3B42, and ‘precipitate’ (mm/hr) in 3B43. This is a class

 

Running TRMM Extractor requires the .NET Framework 1.0 to be installed on the host machine (available on the HD). The 2 other dependency executables, hdf2bin-win2000.exe and hdp.exe (or copies), should always be in the directory where .HDF files are to be processed.

 

 

File System

 

You can either manually select the files to be processed using the ‘Browse’ button and making a multiple file selection, or use the ‘Get’ button to get all .HDF files in the current directory (where TRMMExtractor.exe resides). If the 2 other dependency executables, hdf2bin-win2000.exe and hdp.exe (or copies), are not available in the directory with files to be processed, TRMM Extractor will prompt you to copy them there. TRMM Extractor uses the .HDF extension to detect TRMM files, so changing the extension would deem a file undetectable.

 

At the time of development, the TRMM data file naming convention was “TTTT.YYMMDD.X.HDF”, where TTTT is the type (2A12, 2A25, 3B42, and 3B43), YYMMDD is the year-month-day date, and X could be some other variable such as the orbit number (disregarded by TRMM Extractor). With the ‘Auto Detect?’ feature on, TRMM Extractor uses this naming convention to detect dataset types and process them accordingly. That is, a file named in any other way would fail to be recognized. Should this convention change or files are custom named, remove auto-detection and specify the dataset type. Again, the file should be a 2A12, 2A25, 3B42, or 3B43 TRMM dataset. Support for other/additional datasets can be achieved through modifying the source provided here.

 

 

Output

 

TRMM Extractor can process .HDF files into any of the following formats: .bin for binary (where applicable; intermediate format), .txt for ASCII (tab-delimited), .csv for Comma-Separated Value files, and .asc gird files readily importable into ArcView. The different formats are generated in a cycle; for example, the .csv files is generated from the .txt files, and similarly a header section is added to make the .asc file. That is, not all formats trace back to the source .HDF files. This is to maximize efficiency and minimize processing time by making use of already processed derivatives. In the directory of .HDF files, only corresponding files with formats specified in ‘Output’ are left; all others are deleted. Even if TRMM Extractor uses a .txt files to generate its .csv derivative, if .txt is not selected in ‘Output’, there will be no corresponding .txt format file in the directory. Only the formats selected are generated and kept. Also, if a new processing run is made and TRMM Extractor finds already processed residues in the directory, they are reused to generate the derivative formats. Accordingly, it is advised that you delete all unrequired files and byproducts before new process runs.

 

 

Subset

 

Use this to narrow down to the geographic region of interest. Theoretically, there is no constraint on the increment size that has been set to 0.5 degrees in TRMM Extractor; this can be changed from the source code.

 

Selecting the ‘Similar to source’ option for the 3B43 and 3B42 datasets yields the global belt 40 degrees to the south and to the north of the equator (that is, similar to the source). This option (the default) for the 2A12 and 2A25 datasets is defaulted to the Dhofar region (52E-56E/16N-20N) – make an explicit selection of a larger subset, if that is desired. (Again, this default has been only set for the relatively much larger 2A25 and 2A12 data files due to their long processing time and the chance of mistakenly forgetting to subset them, hence ending with lengthy and unnecessary data.)

 

Output filenames for subsetted data have the ‘Subset’ keyword before the file’s extension. This is the case for all 2A12 and 2A25 files, since outputs are Dhofar subsets by default.

 

 

Processing

 

This listbox displays the set of files under processing or to be processed. Filename highlighting along with the status messages in the status bar verbosely keep track of processing progress.

 

 

Other Issues

 

-         The 2A12 dataset contains both rainfall and cloud data. Once an .HDF files has been detected as 2A12 (or specified as so manually), the program proceeds with asking whether rainfall or cloud content is to be analyzed. If it’s the latter, the cloud layer (height from ground) is selected among 14 layers in total (varying from 0.5 to 20km from Earth’s surface. This can be set once for all subsequent 2A12 files to be detected in a single processing run of many data files.

 

-         At any point during processing, you can stop TRMM Extractor. The program is in an instable state hereon, and it is advised that you close and reopen TRMM Extractor.

 

-         Because file format and dataset type detection is highly dependent on file name extensions so as to minimize having to start analyzing file contents for determining type, it is recommended that the paths to such data files to not contain the keywords such as ‘txt’ and ‘asc’.

 

-         It is recommended that you do not mix many files from different sources (directories) into the same processing run. The optimum usage of TRMM Extractor would be placing the TRMMExtrator, hdf2bin-win2000, and hdp executables into the same directory with all files to be processed, and then use the ‘Get…’ feature to retrieve all recognized .HDF files in the current directory.

 

-         This version of TRMM Extractor was not intended to serve as the most modular of a framework to analyze all TRMM datasets. Due to the limited timeframe and resources, the current version of the software (especially the initial interface form) was highly customized for the intended datasets 2A12, 2A25, 3B42, and 3B43 – all analyzed via dataset-specific C# classes. The developer is expected to add new datasets as necessary, enhance old algorithms (namely for the orbital datasets), and improve the overall interface to a more modular and a fully object-oriented TRMM Dataset Analysis Framework.

 

 

TRMM Analyzer (in Excel)

 

 

This is VBA Macro-based spreadsheet with a set of tools (most of which can be triggered from the TRMM toolbar embedded with the spreadsheet) to quickly summarize, aggregate, and draw time series from .csv files generated by the aforementioned TRMM Extractor ver.1.0.

 

‘Get TRMM Files’ opens all selected .csv files as worksheets in the same one workbook (to facilitate time series drawing and navigation), and then rearranges them according to year (00-02 year files should be manually moved to the end when done).

 

‘Time Series’ aggregates (in date row format) and plots the time series for the selected cell(s) – the average, sum, minimum, maximum, and/or standard deviation of all those selected – taken from all worksheets in the current workbook. This is like a see-through of a given longitude/latitude bounded region from all time instances opened using the ‘Get TRMM Files’ button. The statistic actually plotted can be modified from the VBA source accessible using Alt+F11.

 

‘Plot Layers’ is a modified version of the ‘Time Series’ feature as applied to cloud layers (in the case of 2A12 dataset) instead of time series.

 

‘Lon/Lat’ gives the longitude and latitude of any selected cell (this is heavily dependent on the source subset; and just uses a manually set offset to calculate the longitude/latitude – see source, Alt+F11)

 

‘Stats’ summarized the statistics of a selected area. To summarize a whole data instance (i.e. a worksheet, in this case), just select the whole sheet (Ctrl+A) and click ‘Stats’.

 

‘Remove Sheets’ removes all worksheets from the workbook.

 

‘Rearrange’ arranges all files according to the date in the filename. This, of course, is heavily dependent on the TRMM naming convention. 00, 01, and 02 years start with a zero and are misarranged; simply select all 0xxxxx worksheet tabs and move them together after the last 99xxxx worksheet.

 

Other features are only accessible from the VBA editor (Alt+F11); the corresponding macro or subroutine can be run using F5. One of those would be the ‘AggregateFeilds()’ subroutine that is used to generate the time series aggregated files used in the GIS (see source).

 

 

GridMachine ver.5.2

 

GridMachine is a 3rd party product used to automate the conversion of the .asc files generated by TRMM Extractor to ArcView grids in batch (multiple files in succession). Point shapefiles are also then generated from the grid files to provide scatter-like layers that conveniently overlay over other grid or image layers. Point shapefiles are also necessary for generating the time series shapefiles catering for time series charting from their attribute tables.

 

Adding GridMachine simply involves adding the corresponding extension to the EXT32 directory (as described here). The unregistered version limits batch processing to 6 files at max. The registration credentials for a 42276544 User ID (resulting from installing ArcView from version provided on the HD) are a ‘Hongfei Tian’ full name and an ‘FFOT-38137989’ license number.

 

GridMachine is fully documented and updated at the author’s website (http://www.ecogis.de/gridmachine_de.html).

 

 

 

Dhofar GIS ver.1.0

 

The Dhofar GIS is an ESRI’s ArcView project combining different thematic maps and datasets. The current version should serve as the backbone for any further GIS development; it provides a ‘GIS-friendly’ version of all datasets studied thus far.

 

The ease of comprehending, using, and building on the system is highly proportional to the user’s fluency in using ArcView, among other ESRI products. It is highly recommended that user undertakes/reads some initial ArcView tutorials and documentation so as to maximize his/her use and development of the Dhofar GIS.

 

           

Installation and Requirements

           

ArcView 3.3 can be installed from the HD (setup.exe). The registration number is provided here. If the platform is Windows Xp (recommended); install the latest patch for this OS from ESRI (ArcViewGISPatch4WinXP.EXE).

 

To run the Dhofar GIS, the ArcView 3D Analyst need be installed. It is provided on the HD, and installation instructions are provided here. The project will not open without having 3D Analyst installed. As for Spatial Analyst, although not detrimental to opening the project and viewing its contents, any useful analysis will be heavily dependent on this extension. It’s also available on the HD.

 

Many other 3rd party and optional extensions are provided here for enriching the ArcView environment as described in the directory’s ‘readme.txt’. Note that those extensions are not dependencies for opening, viewing, and performing analysis in the current version of the Dhofar GIS, but might serve as so for adding new datasets and further developing the system.

 

 

Contents and Structure

 

The file structure for the \Dhofar_GIS directory is described in contents.txt. In ArcView, this data consisting of thematic maps and preliminary analysis derivatives are grouped into different ‘Views’. These ‘Views’ are elaborately named in the initial dialog. One of the more subtle ‘Views’ is the ‘Topography Profiling around Ground Stations’, in which the user is expected to use the Profile012.avx to draw profiles intersecting contour lines in an attempt to plot the cross-section of a proposed wind path, for example.

 

Of course, new ‘Views’ can be generated on the fly. Themes can be copied and regrouped from the already provided ‘Views’ or directly added from the HD (all GIS files are in the \Dhofar_GIS subdirectories.)

 

 

Customization

 

At a later stage, this project file can include customization information in regards to dialogs, buttons, and menus custom-made for the Dhofar project. Once more datasets have been incorporated, a much more intuitive or higher-level interface could be developed to access data much more efficiently. At this stage, however, this customization would have more of an overall restraining rather than facilitating effect on the system, namely its further development which would be impeded by premature customization. Adopting a low-lever ArcView architecture built on its core features and analysis capabilities (coupled with those of the powerful 3D and Spatial Analyst extensions), will serve as a solid cornerstone to the system’s evolution through which more datasets, meaningful analysis, and Dhofar GIS-centric customization are expected to be incorporated.

 

Tuesday, June 10, 2003

Loai Naamani