Kendra Daly's data update work (krill and zooabund_lmg): data path: gb6.whoi.edu\data12\sodata\kdaly\mocdata\revised_data_2008 info path: my test site: gb6.whoi.edu\data5\globec\testobjects\ncopley 2008030318: Current data path/file: gb6:data12/sodata/kdaly/mocdata/GFH/zooabund_new.txt and krill_new.txt and .ind files .objects & .remote... files are in gb6.whoi.edu/data5/globec/objects/globec/soglobec First I am figuring out what the data that we currently have looks like. Doing an ExamDiff on flat file of the krill (K) and zooabund_lmg (Z) objects at the .flat3 level gives similar but different data sets: 2001: 1. K has no tow 1 data. K calls first station = '1' while Z uses '1A' Both agree for station '1B' eventlog uses 1A and 1B so done -K sta. 1 should be changed to 1A. affects tows 1 and 2. done -changed time_local_begin from 500 to 456 to agree with event log. 2. Station 2 date differs: day_local in K is 18 and in Z is 19 other metadata agrees. MOC-13 and 14 were on 18th, MOC-15 was on 19th; both objects state that tow=15 so done -date should be changed to 19th in K 3. No tow 10 data for Z but K has krill counts **is this correct?** 2001, sta. 6, tow 17: 4. dates differ: day_local for K = 25. done -change to 27th. day_local for Z = 27 this agrees with event log 5. See (revised) data file: 2001_St6C17_LF.xls, 'net 8 LF' page net 8 (surface tow) has counts Z (no abundance because vol_filt is nd) but not for K: There are plenty of krill in both files - added the krill data from the kdaly file approx. depth given 0-5-10 in excel sheet, it's "Surface ~ 5-10 m" - ok. 2002 data: 6. sta. 1B, tow 8: Z has no line of data for net 5 while K does. In K, taxon is listed as "Lost at Sea". There is no 'comments' column which is where this info. belongs. 7. done -(DMO to fix) Z column width for taxon needs to be adjusted - too short. ---------------------- 20080320: currently served data files: gb6.whoi.edu\data12\sodata\kdaly\mocdata\newmocdata other important files: gb6.whoi.edu\data5\globec\objects\globec\soglobec\ .objects path: .remoteobjects krill.info krill_SO.xml zooabund_lmg.info zooabund_SO.xml 20080321: ??how is '<' and '>' made acceptable? (moc16) done -changed to lt_15 and gt_15. moc1, moc2 use lt_15 Krill file (krill_new_volfilt.xls): date changed: Z: LMG0203, sta.1, tow7 - from 4/17 to 4/25. **event 192, MOC1 tow 15 has sta = nd in the event log but is sta=3 in the datafile zooabund_new.txt. Map of MOC-15 shows it is not near station 3. sta changed: LMG0203, tow 15 - from sta3 to nd. 20080324: done -compare the 2 sets of voltfilt in original datasets (use new excel files, 'volfilt_old'. **2001: Krill has data for tow-1 and tow-10 but Zooabund has none. dobne -**2002: tow-13 net-5 volfilt is 391 in Krill but 591 in Zooabund. In Revised_metadata_1mMOC.doc, it's listed as 591 so Z was corrected. 2002: tow12: diff. volfilt for K and Z in served data. **Changed both to new values. Put krill_new_njc.txt and zooabund_lmg_new_njc.dat on my test site and they look good except for the fixes that still need to be made (above). *Added a column in Zooabund called taxon_group which also exists in zooabund_nbp. It has 3 categories: Copepoda, Euphausiacea, and Other_Zooplankton. This somewhat solves Jason's request (email, 3mar04) to be able to look at just the copepods or another group. If he wants to look at something like molluscs, he'll have to subset the data. *the variables time_local_begin and time_local_end are different in the thesaurus :time_start_local and time_end_local but I didn't change them because it seems that time_local... is a better name - the change is at the end of the name rather than in the middle. *use time_start_local and time_end_local - **edit thesaurus** **be careful - check that the mapping tool won't be broken ** 20080325: In Z: changed Unidentified_Copepodite to Unidentified_or_Other in order to accomodate species that were identified but on the standard list that are not necessarily copepodites. Abundance: 2001, sta. 1A, tow1 - no data for this tow was in the served data so I added it. 2001, sta. 1B, tow3, net4 - abundances were not calculated in the spreadsheet. I made the calculations in the file 2001_St1C3_LF.xls page net_7_LF and added them to the data file. 20080401 - 02: Krill Length frequency data: this data is in excel files, one file per tow and one sheet per net. Only the euphausiid data needs to be added to soglobec; the rest is in zooabund_lmg. I put together a sample file of just one net to see what Dicky thinks about the format I've come up with. It's about 150 lines but this will vary depending on how many individuals are measured for a particular sample. Then I decided the length of each individual krill wasn't necessary and the original krill.txt file is fine - it has the number of krill in each length bin and the abundances for each bin and that's sufficient. On Dicky's suggestion, I wrote to Kendra to see what she prefers.: 4/2/08:Nancy Copley wrote: > Kendra: > > I've been working on getting your corrected volume filtered and abundance data into the SOGLOBEC data system. > The zooabund_lmg set is done. > The krill may also be done, but I wanted to check with you about what you want to see served. I have two options, one is basically the same 'krill' object that was there before, but with the volfilt and abundances corrected. The other object is only completed for 2001, MOC-1, net-1 as a test. Before I go any further, I want to find out from you if this 'krill_length' data is what you need. The main difference is that the 'krill' object has the number of individuals for each length bin while the 'krill_length' object shows each individual length as well. Please take a look at the data in my test site, http://globec.whoi.edu:8081/jg/dir/globec/gb/test/ncopley/ > and let me know how you'd like me to proceed. > > -Nancy Copley 4/11/08: Hello Nancy, Thanks for trying different formats. I like adding the additional information. It saves the user some data manipulation. However, I was confused by the current format. Would it be OK to use '0' (zero) instead of 'ND'? Would the following format or sequence of columns make sense? Stage length_range actual_length sample_fraction count abundance count_per_length abundance_per_length count_per_species abundance_per_species epibionts comments Thanks, Kendra 4/23/08:Nancy Copley wrote: > Kendra: > > We can use zero instead of 'nd' if the count is actually zero and not just unobserved. I think that in the epibionts column, you might want zero's if you can confirm that they were always looked for. If there were blanks in the spreadsheet, I used 'nd'. The other columns need the nd's because they are counts or abundances for a range of rows and the numbers there are the sum for the preceeding rows (does that make sense?). > > -Nancy 4/23/08 Hello Nancy, I agree that zeros make more sense than nd in all the cases you mentioned. The blanks were zeros, even for the sums of preceeding rows. For example, we counted, staged, and measured all larval krill in a subsample and there were no individuals in some stages and length categories. Kendra 05/07/08 - *included some of the following email in info file (7/31/08) Hi all, I had to go back and look at our methods. My lab started the net sample analyses and we measured lengths of krill to either the nearest 0.25 mm for larvae or to the nearest 0.5 mm for older stages. As I recall, after we sent splits to the Russians to complete the sample analyses, they decided it took too long to do individual measurements and placed larvae into 0.5 mm length groups (3.0-3.4 mm) and the older stages to the nearest 1.0mm. It looks like Jason took his original measurements to 0.25mm and placed them in the 0.5 mm intervals used by the Russians. The 0.5 mm length bins (3.0-3.4; 3.5-3.9;) should be the same for all net tows. If you think it is confusing to have the larvae measured to the nearest 0.25 mm for some, but not all tows, then they could be consolidated. If I understand Bob Groman's point the file would be Stage length_min Length_max sample_fraction count_per_length_interval abundance_per_length_interval ... etc. 3.0 3.4 1/500 1 3.1056 For the epibiont and comment information, the occurrence of epibionts was much higher in 2001. This information is only useful as a relative measure of percent occurrence. I have seen these cilates before, but they are episodic. The comments primarily provide information on indirect development. The larvae were placed into a particular life history stage category, but they don't have the exact number of telson spines, etc., typical of that stage. I reported this information in my 2004 DSR paper, so maybe this is not helpful here. --------------------------------------------------- Only 5 tows have actual length; the rest have ranges. To split ranges up into two columns (replace hyphen with tab): copy column into textpad replace: \- with: \t copy the two new columns into two new column in krill_lengths.xls labeled lenbin_min and lenbin_max. The lengths that are ranges use commas in place of decimal points. In excel, edit> replace > , with . choose 'within sheet'; 'by column' Beware - this may also change other commas on sheet. New data file is krill_length_minmax.txt I added min and max columns for both the bins and the actual lengths since most of the tows don't have lengths, but rather have length ranges which are a smaller range with the length_bin. The length min/max should probably not be displayed since they're fairly redundant with the length_bin. --------------------------------------------------- 6/12/08: I got the 2001 tows 1-14 reformated and in trying to join them found I needed to add a column for 'inst' (=MOC1) in order to have enough information to do the join. * I found that tows 13 and 14 had the wrong station (2 - should be 4) resulting in the join not getting these two datasets. * The row count for the join was >1000 short of the original. (7587 vs. 8964) This meant that the rows with a comment in them were being ignored during the join because they didn't match the comments in the eventlog which are all nd's. 6/17/08: All 2001 tow are reformated (1-18). Put them into 1 tab delimited text file (krill_length_2001_join.txt), served ok. Join wasn't working but Warren fixed something on his end so it works now as long as I go to http://optserv1.whoi.edu:8200 before doing the join with the eventlog. (11303 lines) - next step: copy the resulting joined file, paste it into TextPad. To replace spaces with tabs in TextPad: Replace [space][space] with ,, then replace ,, with , repeatedly until only commas separate the columns. To remove commas at end of lines: find: [,]\n replace with \n. Save the csv file and open in excel, using comma separation. Save it as tab file and open in Textpad again, save as Unix type file. 6/19/08: Join-2001 was missing most of MOC-14 because the station was wrong -should be station 2 not 4 as in kdaly xls file. Final row count = 11305 (no header lines, just column header and data) for krill_length_2001.txt and krill_length_2001_join.txt --------------------------------------------------- 6/30/08: Some of the 2002 files are in a different format. Counts & abundances of the lengthbins and species are not calculated and so I'm doing it manually as needed. Also, the length bins are not listed. I'm adding them, same as other files. --SLOOOOWWW! 7/1/08: Finished reformating 2002 tows. They serve ok using .object line: krill_length_2002=defgb(@/data12/sodata/kdaly/mocdata/revised_data_2008/krill_length_2002.ind) To get join to work (i.e. match eventlog), had to change station 1B to 1b, and for MOC-3 changed station from 7 to 5; Dicky changed eventlog MOC1-1 station from nd to 7. Final row count = 7523 (no header lines, just column header and data) for krill_length_2002.txt and krill_length_2002_join.txt 7/7-8/08: Warren made a join_rs that didn't delete duplicate lines using a newer version of rs (2.0). These are the lines to type in on a terminal window to create the restructured file, actually all in 1 line: /data/wsass/port8200/data/methods/rs "defgb(@/data12/sodata/kdaly/mocdata/revised_data_2008/krill_length_2001_join.ind)" 0:cruiseid 0:year 0:inst 1:station 2:cast 2:lat 2:lon 2:depth_w 2:month_local 2:day_local 2:time_local 2:month_gmt 2:day_gmt 2:time_gmt 3:net 3:vol_filt 3:depth_open 3:depth_close 4:taxon 5:stage 5:samp_fraction_denom 6#* > /tmp/njc_krill_2001_join_newrs_w_nocompress.txt #this is the file where the new data goes# *add space in the command line after 'rs' (see above lines 212-220). So, the good files to serve are called: njc_krill_2001_join_newrs_w_nocompress2.txt ===> krill_2001_rs.txt njc_krill_2002_join_newrs_w_nocompress2.txt ===> krill_2002_rs.txt I didn't have the depth_open or depth_close in my dataset so I went back and added them into krill_lengths.xls, recreated the krill_length_200#.txt file, joined (creating files krill_length_200#_join.txt) and restructured again. Resulting files are of same length as originals so I think they're good. Plotted up the cast locations with matlab and made a figure for info file. Previously for krill.info_njc.html, I updated the variables table, inserted new info (text and plot) from Kendra/Jason, etc. 7/28/08: Trying to get a new restructured file with another level - have to do it on xterm rather than in .objects in order to use the new version of rs. But can't quite get it to work. It doesn't come out of the program or something. Instead of the usual prompt ($), I get (>). This is due to my lack of knowledge of unix. [see 7/31/08 for solution] 7/29/08: Info files: changed paths to http://globec.whoi.edu/images/ from local directory. zooabund_lmg.info-njc.html == zooabund_new_njc.info krill.info_njc.html == krill_2001_rs.info - this same info file can be used for all krill objects, I think. served data: krill_length_200x_join and krill_200x_rs both show time with no leading zeroes (e.g. 0015 is shown as 15). This is confusing. Why did the formatting change? Do I need timedateparams? Time comes from the join operation - it's not in my krill_lengths files. 7/30/08: Reordered krill data so it is by station rather than tow# Made companion pages for info files: kdaly_dmo-notes.html - corrections/changes I made to the data. kdaly_PI_notes.html - some notes from Kendra on the measuring of the krill. ** thesaurus needs to be editted: time_start_local Z: abundance is sometimes 0 and othertime 0.0000 - make consistent. Fixed, 7/31/08 done - K: time format reverts to integers each time it's put into a new excel sheet. I reformated time, lat, lon and some other columns and then saved the join file and did rs. 7/31/08 7/31/08: Got the join to work - the problem was that I needed to *add space* in the command line after 'rs' (see 7/7-8/08 above lines 212-220). Number of lines, including variables list but not header comments: 2001 krill: 11305 lines 2002 krill: 7523 lines zooabund (zooabund_new_njc.dat): 28504 **both info files: correct paths to links: kdaly_dml_notes.html and kdaly_pi_notes.html 8/1/08: Dicky noticed that the Z object was acting very strange - it worked fine if you click on next level but if you click on a particular sample, once you get to taxon, you get nothing. I found that there was a SPACE after the taxons Euphausiacea and Other_Zooplankton that was the culprit. Edited .objects and .remoteobjects on /data5/globec/objects/globec/soglobec with new file names: krill=defgb({jgof_read.pl(/data12/sodata/kdaly/mocdata/revised_data_2008/krill_join_rs.txt)}) zooabund_lmg=rs(defgb(@/data12/sodata/kdaly/mocdata/revised_data_2008/zooabund_new.ind),0:cruiseid,0:year,1:station,2:tow,2:month_local,2:day_local,2:time_start_local,2:time_end_local,2:event,2:lat,2:lon,3:net,3:depth_open,3:depth_close,3:vol_filt,3:displ_vol,4:taxon_group,5:taxon,6:stage,7:*) Z: An earlier sort operation in the excel file caused the taxon_group to get shuffled. This caused duplicate listings of the taxon_group within one net. I resorted and this seems to have fixed things. ----------- 8/22/08: Got email back from Marina Marrari (mmarrari@marine.usf.edu) with answers to my questions on stations. Hello Nancy, 1) MOC-13 and MOC-14 from 2001 are Station 4. The map you sent did not come through so I can't see exactly where they are in your plot, but from our own map they would be Sta4. 2) MOC-15 in 2002 is not Sta5, we ended up deciding to call is "Adelaide Is." but you are right, the eventlog has no station assigned to that cast (it was not done at any of the predetermined stations). The rest of your changes look fine to me. Thanks for making sure everything is in the same form for K and Z data. Marina I changed Krill 2002 M15 station from nd to Adelaide_I in (krill_join.txt and) krill_join_rs.txt 20080825: I changed Krill 2002 M15 station from nd to Adelaide_I in zooabund_new_njc.dat Lats differ for M15 zooabund and krill/eventlog. I plotted up both positions plus the start and end positions. The pos for Z is along the route of the start/end line from evntlog so probably ok, but they shold agree. 9/18/08: KRILL: Back to this after a diversion. OK, I put MOCs 13 and 14 (2001) back to station 4. Then I moved them so they go after the other station 4 tows, 10-12. Then I reran the version 2 restructure rs (see 7/7-8/08). Once I got all the spaces where they should go, it worked and I got a new file that served a flat file of 18827 lines. Corrected 2002 MOC-13 to station 2 - I think I edited it to station 4 by accident yesterday or today.