Delivered-To: igs-dcwg@igscb.jpl.nasa.gov Date: Mon, 16 Jun 2003 14:55:06 -0700 (PDT) From: Michael Scharber To: igs-dcwg@igscb.jpl.nasa.gov Subject: [IGS-DCWG-12] Re: [IGS-DCWG-10] Resupplied Data, Comments from BKG In-Reply-To: <6D9A63E05E7DD211A1AB0008C71E6F8A01E4EBD2@pegasus.ifag.de> Message-ID: Sender: owner-igs-dcwg Precedence: bulk ****************************************************************************** IGS-DCWG Mail 16 Jun 14:55:10 PDT 2003 Message Number 12 ****************************************************************************** Author: Michael Scharber/SOPAC I have been pleased to read the various points offered on this thread and would like to throw in my own humble observations. First of all, I'm ashamed to ask but I must: Is there A SINGLE LIST of sites (four character code or whatever), maintained by A SINGLE person/agency, pertaining to data that MUST be archived at all global data centers? And, by list I mean something available anonymously through ftp|http which is also easily parsed AND limited only to these sites. I only ask because I myself do not even know, specifically, which sites comprise the set which all GDCs are required to archive, and therefore mirror from/to other GDCs. My ignorance aside, if I can get this topic straight at least, then I at least know what it is we're raising questions about in the first place. Here then are three topics of personal interest to me. a) IGS Network Topology --- PUSH, DON'T PULL --- I fully support the notion that data for "these" sites (pending definition or my own education) ALWAYS follow a PUSH strategy where (at least): a1. The originator of data for a given site [re]supplies data to an IGS archive (regional, global or whatever) whenever THEY feel necessary. a2. ONE AND ONLY ONE global data center supplies THE OTHER TWO with data for a subset of sites from "the list". This subset is mutually exclusive with regard to the subset of the other global data centers. a3. There may be originators of data for one or more sites from "the list" who can very well PUSH their data to ALL THREE global data centers. These sites would then NOT exist in the subset lists of any of the GDCs. a4. If I don't make any sense on any of a1-a3 then remember this...no GDC PULLS data from another GDC. Ideally, to best achieve the "mirror" among GDCs (in my opinion), there should be NO PULLING of data among GDCs. Each GDC should have an upload ftp server ready to receive data from other GDCs and IGS data centers PUSHING data onward. b) "Knowledge" of [re]submitted IGS raw data files I agree with Heinz that the GSAC could serve very well as an effective means of identifying data resubmissions, as well as primary submissions. I think there is a great deal of utility inherent in the GSAC which is not being used. One of which Heinz points out is data "publication" time. This piece of information, attributed to every file published to the GSAC, is (by definition) in UTC. It is also something that gets updated when data is "republished" to the GSAC as well, thereby lending itself to useful statistics gathering and informed re-retrieval on the part of the user community. Of course, the catch is all GDCs must be GSAC Wholesalers. We're close to this actually, with SOPAC and CDDIS already GSAC Wholesalers, and IGN working on becoming one shortly. I believe that if we can straighten out these two aspects of IGS data archiving then we have a good base to begin approaching some of the many good points offered by Edouard, Nacho, Carey and Heinz. c) IGS Data Resubmissions.....Notification Service: I like this concept alot but feel it needs further discussion. Most importantly, how many of these services would exist? One? Three (one for each GDC)? Or dozens (one at each GPS archive sprinkled around the world)? How many should a user know/care about? How many would be posting redundant messages (messages already posted by another archive)? I think there would be more trouble generated, and user confusion created, if more than one such service exists. How can there only be ONE then? I think that IF, at least to start, all GDCs perform as GSAC Wholesalers then a single, third-party, agency/individual (perhaps an analysis center) could use the GSAC (write a simple application to routinely check for resubmissions of IGS data and weed out duplicate copies) to host a DCWG Resubmission listserver similar to this one......creating emails with a specific format that, perhaps, Nacho could supply. Then, the IGS mailing list would be trimmed of such messages, and users could choose to subscribe to the DCWG Resubmission listserver if they care to. The emails, as Nacho explains, would be both human readable and machine parsable. The catch (at least with regard to the GSAC) is there is currently NOTHING in the GSAC to allow for a statement of "why" a particular data file was resubmitted. That's a problem.....but something the GSAC could possible adapt to handle. Sorry this message was sooooooooo long. Best Regards everyone. Michael > Dear Colleagues, > > some remarks to the discussion about resupplied data: > > 1) It is practical to distinguish > > Case a) Data flow between various data centers (operational, local, > ======= regional and global) > > and > > Case b) Data flow between a data center and an analysis center > ======= > > 2) If all data centers follow the "put approach", resubmitted files > could easily and correctly be handled. As soon as an updated file > occurs in the "incoming" directory of a data center it will be > forwarded to the next level, e.g., from local to regional data center. > "Case a)" could thus be satisfied. The mirror between the > global data centers has to be arranged diffenrently. > > 3) The announcement of a file resubmission by IGS-Mail is meaningful, but > not reliable enough from the analysis center's point of view. > > 4) More difficult to handle is "Case b)". The analysis centers need to > know about new submissions. The GSAC (GPS Seamless Archive Center) > initiative stores the file creation date and the corresponding providers > into the "data holding file (*.dhf)". GSAC could serve as the > "central information source" for analysis centers to check for > resupplied data (provided all data centers participate in GSAC. > > Perhaps my remarks could contribute to the discussion. > > With kind regards, > > Heinz > > ------------------------------------------------------------------------- > Dr. Heinz Habrich > > Bundesamt fuer Kartographie und Geodasie > Richard-Strauss-Allee 11, 60598 Frankfurt am Main, Germany > > Phone +49 69 6333267 E-Mail heinz.habrich@bkg.bund.de > Fax +49 69 6333425 URL http://www.bkg.bund.de > ------------------------------------------------------------------------- > -- ******************************************************* Michael Scharber Scripps Institution of Oceanography Institute of Geophysics and Planetary Physics 8785 Biological Grade IGPP Room 4212 La Jolla, CA 92037 mscharber@josh.ucsd.edu (858)534-1750 *******************************************************