I finally got around to working on this project. It percolated for awhile in my head and then I fought with it for awhile (I'm not really a programmer, just a wannabe lol). But I feel pretty confident that this data is correct now. In my slavejob I do a lot of data-processing, so this is like a fun puzzle to me.
I'm attaching the stitched (synthetic) data that contains the following:
exactly 7 weeks of EURUSD m1 beginning Sunday 1/8/2012 @ 23:00 GMT
exactly 7 weeks of GBPUSD m1 data
exactly 7 weeks of USDCHF m1 data
exactly 7 weeks of USDJPY m1 data ending Friday 7/20/2012 @ 21:59 GMT
There are 197691 bars in this synthetic dataset. Pricing has been normalized so that between seams the data MOVEMENT integrity remains (price is shifted accordingly). Yen pairs are divided by 100 before normalizing. I worked very hard on this normalizing portion to ensure that it was "correct" and takes into account ONLY the pip movement of the pairs, which is the most important aspect I think.
I'm attaching a screenshot here as well showing the Generator chugging. This strategy is not "good enough" yet, but you can see the price curve in background (light grey line) looks normal.
Next steps for me:
Find a good strategy with Generator
Test strategy on non-synthetic (ie real life) data for these pairs, compare stats
I believe this will be a good test to prove or disprove the theory that a strategy generated with synthetic-stitched data is useful.
Shortly thereafter, I plan to tweak the code a little and do that other variation I mentioned above. In that scenario, I will merge the data across all datasets per minute. I think I will probably average the data for each bar. Resultant synthetic bars should be about 50,000. There will be some issues to deal with, such as missing bars in one of the datasets.
<?php
//stitch together data sequentially
//FSB probably doesn't care about weekends or leapyear, so don't worry about this for now
//prerequisite: ensure all datasets begin/end at exactly start/end of week
//login to database
$linkid = @mysql_connect("localhost", "root", "") or trigger_error(mysql_error(), E_USER_ERROR);
mysql_select_db("dusktrader", $linkid);
$stitched = array(); //holds all stitched, time-shifted bar data
$i=1; //array index, keeps track of stitched bars
//datasets to stitch together, each set prepared with exactly 7 weeks of data
processdata('eurusd_m1', 0); //EURUSD not shifted
processdata('gbpusd_m1', 7);
processdata('usdchf_m1', 14);
processdata('usdjpy_m1', 21);
//insert stitched data to MySQL (easy to export from MySQL in FSB-ready format)
print "$i bars processed<br>";
$j=1;
while ($j < $i)
{
$sql = "INSERT INTO stitched_m1 " .
"(feeddate, feedtime, open, high, low, close, vol) " .
"VALUES (" .
"'".$stitched[$j]['feeddate']."'," .
"'".$stitched[$j]['feedtime']."'," .
$stitched[$j]['open']."," .
$stitched[$j]['high']."," .
$stitched[$j]['low']."," .
$stitched[$j]['close']."," .
$stitched[$j]['vol'] .
");";
$resultid = mysql_query($sql, $linkid);
$j++;
}
exit;
//read and process bar data from an existing table
//shiftweeks specifies the amount of calendar weeks data will be shifted
//from the original feeddate/feedtime
function processdata($table, $shiftweeks)
{
global $linkid, $stitched, $i, $prevclose;
//get initial opening value so we can calculate the difference
$sql = "SELECT open FROM $table WHERE id=1"; //first record
$resultid = mysql_query($sql, $linkid);
$resulttext = mysql_fetch_object($resultid);
if (floatval($resulttext->open) > 10) $divisor = 100; else $divisor = 1; //divide Yen pair values by 100
$open = floatval($resulttext->open) / $divisor;
if (!$prevclose) $prevclose = $open; //synthetic data start point
if ($open > $prevclose) $difference = ($open - $prevclose) * -1;
else $difference = $prevclose - $open;
//loop through all data with price normalizing and time shifting
$sql = "SELECT * FROM $table"; // ******** APPLY LIMIT HERE for testing
$resultid = mysql_query($sql, $linkid);
while($resulttext = mysql_fetch_object($resultid))
{
$open = floatval($resulttext->open) / $divisor;
$high = floatval($resulttext->high) / $divisor;
$low = floatval($resulttext->low) / $divisor;
$close = floatval($resulttext->close) / $divisor;
$timeshifted = timeshift($resulttext->feeddate, $resulttext->feedtime, $shiftweeks);
$stitched[$i]['table'] = $table;
$stitched[$i]['feeddate'] = $timeshifted['feeddate'];
$stitched[$i]['feedtime'] = $timeshifted['feedtime'];
$stitched[$i]['open'] = ($open + $difference);
$stitched[$i]['high'] = ($high + $difference);
$stitched[$i]['low'] = ($low + $difference);
$stitched[$i]['close'] = ($close + $difference);
$stitched[$i++]['vol'] = $resulttext->vol;
}
$prevclose = $close + $difference; //to be used on next dataset
}
//parse and shift time by specified weeks into future
function timeshift($feeddate, $feedtime, $shiftweeks)
{
$unixtime = strtotime(str_replace(".","-",$feeddate)." ".$feedtime);
$unixtime += (604800 * $shiftweeks); //# weeks into future
$timeshifted['feeddate'] = date('Y.m.d', $unixtime);
$timeshifted['feedtime'] = date('H:i', $unixtime);
return $timeshifted;
}
?>