29 Jul 2011

Gravity terrain correction code testing using UTEP (PACES) DB of N
American stations.

Yesterday, tried to run the whole 1.28 million stns in one go; terra
ran out of RAM at 8.2 GB allocated and swapping was horrible.  So,
let's just do subsets at a time....

Use shell script ./grab_stns N S E W to get blocks of stations and run
them, then compare results to run of same block through Plouff
terrain_correct in ~/devel/gravity/terrain....

Blocks: [N.B. - my Plouff TC maps start at 100 W or 100.25 W!!!!]
  North	South	East	West
  43	35	-112	-115
  47	42	-112	-120
  47	44	-101	-110
  35	32	-109	-112
  45	42	-109	-112

For each block, extract UTEP stns, calc terrain corrections, and save
results.  Do this in a script, cuz I'll have to do it many times
anyway:
  [terra:25] ./run_UTEP_stns 
  (this takes a really long time)

And, we need to run these same blocks through the USGS code, and do
this with a script because it is even more annoying:
  [terra:54] time ./run_UTEP_USGS 
  (this run takes like 1-1.5 min!)

Since the run with my current TC code is taking so long, let's change
the script to start 2 in the background, 1 in the foreground, and thus
parallelize a great deal of this.  Also, tweak makefile above to add
optimization options...

Restart 5 blocks (2 in bg, 1 fg, then 1 bg, 1 fg) at 14:13:30 29 Jul
2011; note I/O waiting for loading the DEM data, then all 3 terrain
procs cranking at 100% CPU, 1 core free. Gotta love having 4 cores to
play with.....

  [that run took _days_ to finish all 5 chunks; longest running chunk
  had CPU time (from top) of 1430+ min!]

So, problems with scaling by N*M (# stns X # DEM polys) to large areas
and numbers of stations.


2 Aug 2011

To fix, let's use an oct-tree to partition DEM polys and then extract
polys within range from the octree....

Start coding this using utah-g3d octtree.cpp/h as basis, but altering
to use geopoly types, and store geopolygon stuff (id, center, verts,
values, masks, etc.) in leaf nodes, so can then read it back out when
searching.  Otherwise have to have hash of geopolys by id; way more
work than just storing geopoly dem info and then creating a local
GlobalGrid for the octree leaf nodes in range of given location.


3 Aug 2011

Finish coding, and start testing.  Initial test ran long time on first
station - infinite loop in linked-list-to-GlobalGrid conversion! Oops.
Fixed.

Start run for time testing using small block 43 to 44 N, -112 to -113
E; 1660 stations, 1.6 million DEM polys.  Currently looks like a run
time of 30-40 min.  But! This new scheme should scale way, way better
for larger chunks, and degrade to older, test-every-block scheme for
small areas where a tiny hit won't matter; and I get to claim I'm using
binary space partitioning into an octree, which just sounds cool.

Also, a 1x1 deg block takes ~1.4 GB RAM while running - no appearance
of memory leaks, so should (in principle) be able to scale to about 2x3
deg chunks without RAM issues.  Thus, maybe have to run fewer parallel
terrain correction chunks to keep swap from being dominant time loss.

Wait for this test run to finish, and then spool up the real test.

Preliminary look at terrain correction differences from last run and
USGS show some large difference stations - 10s of mGals!  So, will have
to extract some of these stations and take a look in detail to see
where things might be breaking....

Total run time for 1x1 deg test chunk: 46m49.51s, 44m55.74s user CPU
time, roughly 3s of I/O wait....

So, spin up the new(er) version of run_UTEP_stns and do something else
until tomorrow (maybe)....
  [terra:17] nohup time ./run_UTEP_stns &
  [1] 4074


Note that currently running first 2 chunks: 38 to 35, -112 to -115 and
47 to 45, -116 to -120 and using 2.1 to 2.4 GB for each process; so could
run larger blocks or a 3rd proc at once.  Both procs have run for 45 min
and appear to be 7.5% and 25% done; 38 to 35 block has 14285 stations,
47-45 has 4578 stations.  Looks like both runs are cranking a station
out in the same time, which is what I expect for the octree-enabled
version.

First chunk is done - 4578 stations in roughly 3 hrs, 3.5 hrs == 37%
done for 14285 stations ==> 585 min for full run = 9.75 hours. Ick.
Better than the last try (1400+ min!), but still not great.  Process is
still holding at 2.1-2.4 GB of RAM, so clearly don't have significant
memory leak. Yay!  At least it now appears to scale with # of stations
more than with size of DEM bbox....

Actual times for the runs, in finish order:

real	188m2.402s	4578 stns	2561526 polys
user	173m34.607s
sys	9m43.892s

real	519m42.029s	14285 stns	2710916
user	490m56.361s
sys	24m4.398s

real	210m3.043s	5369 stns	2657218
user	195m51.734s
sys	9m55.777s

real	900m51.493s	22643 stns	2551898
user	854m38.705s
sys	41m25.679s

real    402m26.831s	10008 stns	2971850
user    382m2.637s
sys     19m25.193s



5 Aug 2011

All runs complete, start looking at large differences and why uGrav and
USGS don't match.....

So, start with stations with large and small deltas:
  awk 'sqrt($5*$5)<0.1{print;next;}' tc.diffs > tc.small.diffs
  awk 'sqrt($5*$5)>5{print;next;}' tc.diffs > tc.large.diffs

Convert file of large delta records for Octave use; drop name:
  awk '{print $2,$3,$4,$5,$6,$7}' tc.large.diffs > tc.large.diffs.no_name

Now plot extracted terrain (terrain_extract.out) and 0 entries in
terrain, and stations with really big TTCs....


Rebuild diffs file for Octave use, with all stns:
  awk '/^#/{next;}{print $2,$3,$4,$5,$6,$7}' tc.diffs > tc.diffs.no_name


And, now I think the USGS results are wonky, so compare with the
terrain corrections supplied by UTEP from their web page.  Go to
~/devel/gravity/GravDB/UTEP and make a single file with all UTEP
stations, and onyl one header.  Then make a python script based on
compare_tcs.py for the UTEP format and run it on the whole TC output:
  [terra:81] ./comparewutep.py UTEP.stationlist UTEP.my_tc > tc.diffs.utep

And, the difference numbers look way smaller, which implies uGrav TC
not so bad, and USGS numbers weird.  Since the extracted terrain data
has lots of artifacts (last or first column of maps swapped with next
map?), including map corners with elevations of 0 (in the Rockies, for
example), maybe USGS ttc not so good.

Change stuffnthings.m to use UTEP vs. uGrav difference file and
replot....

This time, stuffnthings.m loads the diff file and gives a histogram of
differences (column 4); not that out of 56883 stations, have 248
stations with difference >10 uGal (absolute)! One station has a
difference of 240 mGal, and minimum is -19.6 mGal! So, uGrav TTC is
really close to UTEP result.  Histogram shows >35k stations within 1
mGal difference!  Since most of these stations have corrections of >1
mGal, this is good!

So, uGrav TTC is actually probably good to go, but will check with the
entire UTEP DB by running 10x20-30 deg chunks at a time.... It's
Friday, so let it burn for the weekend, and the octree version will
REALLY get a good testing this time....

And I killed it; chunks taking 6+ GB each, so either run 1 at a time,
or lots more little chunks....  Let's do lots more little chunks since
I can spin up 3 chunks at a time if they are small enough; 3x3 deg was
2.4 GB, so run as 3 chunks of 5x10 deg again and again and again....

16:50 5 Aug 2011 - start script to run ALL UTEP stations in 5x10 chunks
for comparison as with small bits above.  Command:
  [terra:7] nohup time ./run_all_UTEP &

output files are allutep*tc concatenated into UTEP.my_tc.all; then use
comparewutep.py to compare all the stations and then generate histogram
with octave....


8 Aug 2011

Still running after all weekend, b/c need more RAM to run large (5x10)
blocks; some take 4-8 GB of RAM.

So, change run_all_UTEP to run one job at a time; update terrain.c to
stop before loading DEM if no stations in station file.  First block
with stations: 30 to 25, -80 to -90 == 16793 stations, 7 265 197 polys,
7609 MB RAM used(!)

This is currently at 14% done with 314 min runtime ==> 2243 min full
run = 37 hours!

So, maybe need to run 5x5 chunks or less (1x1 in a for loop?)


Start a new version of run_all_UTEP that sets up 3 subshells for 10
degree bands, running 1x1 degree chunks 3 at a time; should always have
3 procs running; 1x1 deg blocks appear to take 1-1.5 GB RAM each, so
using 6-6.8 GB of RAM total....


11 Aug 2011

Found bug in octree.c - didn't assign NULL to newly allocated entries
in linked list next pointers, so got occasional segfault/double free
errors when allocating a new linked-list entry, and then trying to free
list->next when ->next hadn't been allocated but wasn't NULL b/c memory
block was being reused.  But, this means some files in all/ have no
terrain corrections for stations because only got partway through and
died with segfault/glibc double free/corruption error.

To find a list of files where this might be the case:
  [terra:20] for i in allutep_*[0-9]; \
    do I=`ls -s $i|awk '{print $1}'`; \
    J=`ls -s $i.tc|awk '{print $1}'`; \
    if [ $I -gt $J ]; then echo $i;\
    fi; done > broken_files
So, all/broken_files has list of station files where the terrain
correciton output is not larger than the input file (expected b/c some
station will have a ttc that takes more space than 0.000 in the input
files).

So, check the broken_files list with ls:
  [terra:27] cat broken_files | xargs ls -l
  [terra:28] cat broken_files | awk '{print $0 ".tc"}' | xargs ls -l

Looks like need to run all those files again. Lots of the tc files are
clearly partial outputs, or no output (due to buffering, I assume).

So, run all these in turn using a single core; run_all... is currently
using 2, so have one free to use:
  [terra:37] for i in `cat broken_files`; do ../../terrain $i > $i.tc; done


22 Aug 2011

Looks like the run kept going on the 30-39 block into the 40s; stop
script and start the last 3 entries again to recover tc files:
42 to 41 N, 110 to 113 W.  Then can conctenate all files and look for
differences with UTEP PACES file.

  [terra:5] cat all/allutep_*.tc > UTEP.my_tc.all
  [terra:12] ./comparewutepdb.py UTEP_gravdb UTEP.my_tc.all > tc.diffs.allUTEP
  [terra:28] awk '{print $2, $3, $4, $5, $6, $7}' tc.diffs.allUTEP > tc.diffs.allUTEP.no_name

Plot a couple histograms in octave: plot_tc_diffs

Some useful stats:
  # of stations: 1159984
  # of stations with my TC diff from UTEP by >1 mGal: 161494
  # stations with TC diff<=1: 955999
  fraction of stations with diff>1 mGal: 0.139221

Plots of stations with differences of >1, <=1 mGal show stripes of
missing stations between 39, 40 N and 29, 30 N; also strip between 78,
79 W.  So, run individual strips to fill in holes with:
  [terra:28] ./run_UTEP_strips 

Fix run_all_UTEP so only might have to rerun 79-80 N-S strip; not sure
why that chunk not done; no stations in UTEP?

Also, many stations with clearly wrong TCs - result is nan or >1e3
mGal; many look like memory corruption.  Reran one by hand, and all is
well (TC is 0.031 mGal).  So, get list of stations, extract from
UTEP_gravdb, and run them again:
  [terra:19] awk '$5>1e6{print $1}' tc.diffs.allUTEP > stns.run_again
  [terra:17] ./get_stations_by_id.py stns.run_again UTEP_gravdb > stns.to_rerun 
total of 43377 stations to re-run....

most stations in strip between -88,-90; -77,-79
split stns.to_rerun into 4 chunks: <-80, >-80; <39, >=39 with awk:
  [terra:26] awk '$2<-80&&$3>=39{print}' stns.to_rerun > stns.rerun.1
  [terra:28] awk '$2<-80&&$3<39{print}' stns.to_rerun > stns.rerun.2
  [terra:29] awk '$2>-80&&$3>=39{print}' stns.to_rerun > stns.rerun.3
  [terra:30] awk '$2>-80&&$3<39{print}' stns.to_rerun > stns.rerun.4
rerun each chunk alone, since already running 3 TC procs for strips...
  [terra:31] ../terrain stns.rerun.1 > stns.rerun.1.tc
	results are coming out as values near 0 (<1 mGal), so problems
	weren't from bad calculations; must be a memory problem....
	N.B. - this chunk takes 4GB RAM during run; 7x11 deg box
	(due to w/166.7 km radius). Run time of 0:7:30 or so...
and run more chunks:
  [terra:32] ../terrain stns.rerun.2 > stns.rerun.2.tc
  [terra:33] ../terrain stns.rerun.3 > stns.rerun.3.tc
BUT GET LOTS OF ERRORS from greatCirclePtLon:
  greatCirclePtLon: Qp.lat=1.36150979e-09 ==> rotation matrices might be wrong!
and TCs come back as nans!!!!!
why are these entries dieing?

  [terra:34] ../terrain stns.rerun.4 > stns.rerun.4.tc
and all TCs are showing nan!

So, grab a station from each chunk that gives a nan and check it out:
  [terra:35] grep X549533 UTEP_gravdb > zap
  [terra:36] grep X1112444 UTEP_gravdb >> zap
  [terra:37] cat zap
  X549533   -79.170438   33.368668   1.800 979560.960 0.000  0.000 -0.010    65832G02
  X1112444  -77.216372   40.001260 153.310 980091.350 0.000  2.930  3.670    2274A179
tweak terrain.c to check for nan, drop fabs() call for negative TTC...
  [terra:38] ../terrain zap > zap.tc
and results show up as nan from ocean and land cells. Cells have elevations of
-395 to -427 m; zmin, zmax seem to be correct.

When I put traps to ignore nans for all cases, output results look like
they'll be really, really close to UTEP.  So, what is problem with
polys that creates nan entries?

Now rerun blocks with nans....

Note that many nans coming from incorrect choice of polygon points for
interpolating latitudes.  Other nans from taking acos() of an argument
larger than 1. So, trapped acos() arg and set to max of 1; leave traps
for nans in integration (don't add them); rounding of acos() arg does
probably lead to all those greatCirclePtLon messages - turn them off in
the code....

Rerunning terrain corrections for blocks 3 and 4 takes 4-8 hours (10k
stns or more) each.

Note that nan entries only seen with polys >130 km from station, in
some areas where x,y,z coords for rotation come out really close to
1,0,0 or equiv.... So the "dropped" polys don't contribute much to the
terrain correction anyway; changes are in uGal or smaller.... Also note
a few dozens of polys out of many thousands at long range are
irrelevant.

Once the calculations are done, compare again between UTEP published
numbers and mine with comparewutepdb.py and then replot the country...


24 Aug 2011

Because of the need to rerun chunks/strips, have to use a different
command to concatenate all results together:
  [terra:53] cat all/allutep_*.tc stns.rerun.*.tc | grep -v " nan" > UTEP.my_tc.all 
 (the grep drops entries from all/*tc that didn't work with nan
 results)

Then, compare to UTEP results:
  [terra:53] ./comparewutepdb.py UTEP_gravdb UTEP.my_tc.all > tc.diffs.allUTEP
and create file without name fields for use in Octave:
  [terra:53] awk '{print $2, $3, $4, $5, $6, $7}' tc.diffs.allUTEP > tc.diffs.allUTEP.no_name 

And then plot histograms for distributions of changes in Octave:
  plot_tc_diffs (uses tc.diffs.allUTEP.no_name)

but max, min show results of +-3e34!!! So, still have stations with
insane results.  Let's find them:
  [terra:58] awk 'sqrt($8*$8)>100{print}' UTEP.my_tc.all > tc.large.diffs

OK - large diffs are results from previous run(s); so need to replace
all/all*tc stations with rerun results.  Need a python script for this,
and then make one big file of replacement stations.
  [terra:65] cat all/allutep_*.tc | grep -v "nan" > UTEP.my_tc.all
  [terra:66] cat stns.rerun.[1-3].tc > UTEP.my_tc.rerun
  [terra:67] ./replace_stns.py UTEP.my_tc.all UTEP.my_tc.rerun > zap
  [terra:68] mv zap UTEP.my_tc.all
  [terra:69] ./comparewutepdb.py UTEP_gravdb UTEP.my_tc.all > tc.diffs.allUTEP
  [terra:70] awk '{print $2, $3, $4, $5, $6, $7}' tc.diffs.allUTEP > tc.diffs.allUTEP.no_name 

Will need to rerun once stns.rerun.4 is finished; another few hours I
think.

Some rerun stations have corrections of >10 000 mGal; memory leak/error
somewhere; could be fixed by now.  Extract from UTEP.my_tc.rerun and
run just those 4 stations:
  [terra:93] awk 'sqrt($8*$8)>1e4{print}' UTEP.my_tc.rerun > zap
  [terra:94] ../terrain zap
And then update stns.rerun.*.tc to use the new values (<1 mGal for all
4).

Rebuild master tc file (UTEP.my_tc.all), replace with rerun stations,
rerun comparewutepdb.py, and regenerate tc.diffs.allUTEP.no_name.

Plot histogram with plot_tc_diffs; still have stations with insane TCs;
extract:
  [terra:110] awk 'sqrt($8*$8)>1e4{print}' UTEP.my_tc.all > tc.large.diffs
there are 116 records in this file.  And, these stations are not in the
rerun file.  All appear to be in a strip from 28 to 47 N, -89 to -90 E.
So, let's rerun these as stns.rerun.5 and then add them to the station
replacement file...

Found and fixed a bug in spherical.c - flipped lat, lon in calculation
of x,y,z from lon,lat,h; compared llh2xyz with sph2xyz and flipped
coords in llh2xyz to make the 2 nearly match (from diff by 100-10 000x
to diff in m out of 100s of km).  This problem showed up in
stns.rerun.5 as stations with only 1500 or so polys run, which is
clearly silly.  This was a real problem only for stations with lons
near -90; x,y coords clearly off (x&y should not both be <3000 km).
So, rerun stns.rerun.5 with newest terrain.  This could be cause of
some large diffs in the Rocky Mtns, Sierra Nevadas....

Will need to extract and rerun stations with diffs more than 10%, say?

Added rerun.5.tc to UTEP.my_tc.rerun and rebuild difference files.
Now, differences range in [-6819.3, 9445.4]
  -6819.3 diff: X724973
  9445.4 diff: X592043


Now, check for differences larger than 100 mGal (absolute), and
extract:
  [terra:150] awk 'sqrt($5*$5)>1e2{print}' tc.diffs.allUTEP > tc.large.diffs
there are 221 stations in this file, scattered all across the U.S. (26
to 49 N, -72 to -120 E).  So, won't rerun this file in one piece;
memory requirement is more than terra's RAM....

Try running for one station
  X604490   -89.352069   36.443562  86.860 979832.330 0.000  0.000 -0.040    6404_ngs
which has listed TC from my code of 1118 mGal, -0.04 from UTEP...
New run shows TC of -0.007 TTC.  So, perhaps most/all large differences
due to x,y,z bug fixed above....

So, run each of the 221 stations alone:
  echo "# rerun 1 station at a time" > stns.rerun.6.tc; \
  for i in `awk '{print $1}' tc.large.diffs`; \
  do grep "$i " UTEP_gravdb > zap; ../terrain zap >> stns.rerun.6.tc; done

and then add stns.rerun.6.tc to replacement file, once stns.rerun.4 is
finished....

[terra:61] cat all/allutep_*tc | grep -v "nan" > UTEP.my_tc.all 
[terra:62] cat stns.rerun.*.tc > UTEP.my_tc.rerun 
[terra:63] ./replace_stns.py UTEP.my_tc.all UTEP.my_tc.rerun > zap
[terra:64] mv zap UTEP.my_tc.all
[terra:65] ./comparewutepdb.py UTEP_gravdb UTEP.my_tc.all > tc.diffs.allUTEP
[terra:66] awk '{print $2, $3, $4, $5, $6, $7}' tc.diffs.allUTEP > tc.diffs.allUTEP.no_name

minimum now -290.840, maximum now 632.018; so looks like down to some
individual stations with issues - bad elevations?, etc.

24705 stns with | diff | > 5 mGal, out of 1256741 or 1.97%

Build histogram bins:
  [terra:27] awk '{print $5}' tc.diffs.allUTEP | \
    ~/devel/histogram/makehist - -5 5 20 2> tc.diffs.hist

Plot with gnuplot (histogram.gpc) to make histogram.ps.



26 Aug 2011

Copy testdata (4 stations in Tucson) to testdata.edited and edit
elevations to match nearest DGG polygon DEM value; use
./get_dem_elevation lon lat (which calls findpolybyloc and
gridval2ascii to extract info from binary DGG files). Stn mghous had
largest change (-12 m!), and my TC results with new elevations:
  mghous 1.551
  tucabsex 0.146
  b847-1 -0.013
  tucgs87 -0.015

Compare to USGS results: 1.54, 3.43, 0.07, 0.17 respectively. Note that
I now doubt the USGS results for these stations as there could be a
problem with the maps being used to compute the USGS corrections.

So, let's try some of the UTEP stations with large differences:
  X7447 (471 mGal diff); DEM ele: 49 m; stn entry ele: 35.4 m
  Try editing stn elevation to DEM elevation and rerun TC. UTEP reports
  tc of 0.03 mGal. My TC for stn @ 49 m is -0.027 mGal.

So, setup a shell script to run through each of the 24705 "large diff"
stations and reset station entry to DEM elevation, and rerun TC and get
new difference file - tc.large.diffs.new

WARNING WARNING WARNING - NEW INSIGHT AND NEW VERSION OF TERRAIN!!!!

So, looked at a station with a computed terrain correction of -30 mGal,
and wondered WTF! Well, the sign comes from having delta radius
negative (polygon elevation way below stn elevation), and drho positive
(2670).  BUT, realized that to do things that way, need to swap signs
on drho depending on whether station is below or above a polygon;
stations below polygon use drho of 2670 (air->rock), stations above
poly use drho of -2670 (rock->air).  BUT, this is the same as using
rho=2670 and taking absolute value of radial difference (delta radius
swaps sign from drho in both cases).

SO - rebuild terrain.c to use fabs() call around (rads[i]*cosdelta-r)
for calculation of each radial g term, and put in a big ol'comment on
how that works, and how it is special to terrain correction.  Now, this
means that lots (all) of negative terrain corrections will probably
swap sign and change mag a bit (no more offsetting numbers).  So,
restart the hideously slow rerun_large_diffs script (which will be done
next week, maybe).  Should really rerun the whole damn thing, but I
don't have a month.

But, reported agreement will be worst case scenario - if can drop all
the large difference stations into the +-5 mGal band or better, should
also expect most of the 1-5 mGal diffs to drop as well.  So, could just
rerun stns with abs diff > 1 mGal? So that turns out to be 182k
stations.  Maybe just set off run_all_UTEP and wait for the week; don't
have anything else that will need to crank for a bit.....

Start run_all_UTEP at 14:16 26 Aug 2011; good thing I will be in
Yellowstone from Sep 5 to 11.


12 Sep 2011

All UTEP stations finished latest run while in Yellowstone.  Rebuild
difference files, histograms:
  [terra:12] ./comparewutepdb.py UTEP_gravdb UTEP.my_tc.all > tc.diffs.allUTEP
  [terra:13] awk '{print $2, $3, $4, $5, $6, $7}' tc.diffs.allUTEP > tc.diffs.allUTEP.no_name
  [terra:17] awk '{print $5}' tc.diffs.allUTEP | \
    ~/devel/histogram/makehist - -5 5 100 2> tc.diffs.hist
  [terra:18] gnuplot histogram.gpc 

New stats: 8748 + 7962 = 16710 stations with | diff | >= 5 mGal
  out of 1 258 462 stations.
  821 294 stns with diff in [-0.2,0.2] mGal



27 Sep 2011

Let's make a set of histograms with terrain correction differences
scaled by the new TTC; typical assumption is that the TTC is really only
good to ~20%, so look for points outside +-0.2 scaled....

  [terra:12] awk '{print $5/$6}' tc.diffs.allUTEP | \
    ~/devel/histogram/makehist - -2 2 20 2> tc.diffs.hist.scaled

Not the best looking plot, so maybe don't include yet....

See histogram.scaled.ps, generated from histogram.gpc.
