Announcement

Collapse
No announcement yet.

NQM4 Tick Data Not Consistent

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • NQM4 Tick Data Not Consistent

    Check out the following discrpancies for NQM4 tick data.

    I opened an NQ M4 50T tick chart with 10 days of data on it.

    I downloaded the tick bars for this chart at around 12:30 PM CST. The first 20 bars of this download are shown below:

    Table A
    Date Time Open High Low Close Vol
    4/1/2004 0:52:00 144400 144550 144350 144450 210
    4/1/2004 1:27:55 144450 144600 144400 144600 209
    4/1/2004 1:40:02 144600 144600 144600 144600 230
    4/1/2004 1:51:06 144600 144600 144500 144550 218
    4/1/2004 2:00:20 144600 144650 144550 144600 112
    4/1/2004 2:19:07 144600 144700 144550 144550 157
    4/1/2004 3:02:01 144550 144650 144500 144500 134
    4/1/2004 3:56:46 144500 144500 144350 144350 249
    4/1/2004 4:07:38 144350 144350 144200 144300 229
    4/1/2004 4:24:43 144300 144400 144250 144300 233
    4/1/2004 4:51:18 144300 144350 144200 144200 278
    4/1/2004 5:16:06 144200 144250 144150 144200 459
    4/1/2004 5:37:59 144200 144350 144100 144200 169
    4/1/2004 5:52:24 144200 144250 144100 144100 165
    4/1/2004 6:01:31 144100 144150 143950 144000 261
    4/1/2004 6:12:18 144000 144050 143800 143850 253
    4/1/2004 6:18:10 143850 143950 143800 143850 194
    4/1/2004 6:23:11 143800 143950 143800 143900 306
    4/1/2004 6:28:18 143900 144000 143850 143900 160
    4/1/2004 6:46:15 143850 144000 143800 144000 140

    I then exited Esignal and restarted Esignal at aroung 11:30 PM CST. At this time I again opened an NQ M4 50T tick chart with 10 days of data on it. Although this chart also appears to start with the same bar, by bar 15 the 11:30 PM chart differs from the 12:30 PM chart.

    See the first 20 bars of the 11:30 PM chart below and compare them with the bars shown above:

    Table B
    Date Time Open High Low Close Vol
    4/1/2004 0:52:00 144400 144550 144350 144450 210
    4/1/2004 1:27:55 144450 144600 144400 144600 209
    4/1/2004 1:40:02 144600 144600 144600 144600 230
    4/1/2004 1:51:06 144600 144600 144500 144550 218
    4/1/2004 2:00:20 144600 144650 144550 144600 112
    4/1/2004 2:19:07 144600 144700 144550 144550 157
    4/1/2004 3:02:01 144550 144650 144500 144500 134
    4/1/2004 3:56:46 144500 144500 144350 144350 249
    4/1/2004 4:07:38 144350 144350 144200 144300 229
    4/1/2004 4:24:43 144300 144400 144250 144300 233
    4/1/2004 4:51:18 144300 144350 144200 144200 281
    4/1/2004 5:21:58 144200 144250 144150 144200 451
    4/1/2004 5:39:05 144200 144350 144100 144200 163
    4/1/2004 5:53:38 144150 144250 144100 144100 167
    4/1/2004 6:01:31 144100 144150 143950 143950 265
    4/1/2004 6:11:18 143950 144050 143800 143850 253
    4/1/2004 6:18:01 143850 143950 143800 143850 190
    4/1/2004 6:23:02 143850 143950 143800 143900 290
    4/1/2004 6:28:09 143900 144000 143850 144000 180
    4/1/2004 6:44:43 143900 144000 143800 144000 140

    By bar 12, these two charts already differ in their volume vals:
    Bar Time Open High Low Close Vol
    12A 5:16:06 144200 144250 144150 144200 459
    12B 5:21:58 144200 144250 144150 144200 451


    By bar 13 these two charts differ drastically in their timestamps and their volume vals:
    Bar Time Open High Low Close Vol
    13A 5:37:59 144200 144350 144100 144200 169
    13B 5:39:05 144200 144350 144100 144200 163

    By bar 20, the last bar shown for each table above, the timestamps, and open values differ:
    Bar Time Open High Low Close Vol
    20A 6:46:15 143850 144000 143800 144000 140
    20B 6:44:43 143900 144000 143800 144000 140

    As you can imagine, the discrepancies between the two downloads are numerous and continue until the end of the files.

    Why do these discrepancies occur? This makes backtesting with tick data very difficult. My data points change depending on when I log in to Esignal, even when the charts seem to start with the exact same bar!!

    If anyone knows how esignal plans to deal with this please let me know.

    I read another post about synchronizing tick servers, but these two charts seemed to be synchronized from the first bar, and they still go out of synch within 13 bars.

    This cannot be based on my PC's clock because the bars go out of synch on data from 4/1/2004 that I am downloading at different times of the day on 4/13/2004.

    This has to be a problem with Esignal's data on their end.

    Again, any help in fixing and or understanding this grave problem is greatly appreciated.

  • #2
    Hi cyperian,

    Thanks for the post. To the best of my knowledge, the Tick server synchronization project mentioned here has been completed. I'll forward this issue along with the data you enclosed over to the QA team so we can look into why the results are different. As soon as I get some information why this is happening, I'll reply back to your post. Thanks.

    Comment


    • #3
      Comments on Reply, tick data integrity, etc

      (I downloaded the data I posted by using your "Tools | Data Export" menu selection and then hitting the "Save as CSV..." button.)

      I am quoting the text of the FAQ you included in your reply below with a few of my comments interspersed:

      **FAQ*******************************
      I'm comparing eSignal on two different computers and I'm noticing very slight differences in 1 minute candlesticks. What would account for these differences? I also notice changes when I refresh my own chart?

      We time-stamp the quotes as they are received from the exchanges. These quotes are then sent to our tick servers to build our database for tick and interval charts. Records are processed within milliseconds of each other. Even so, it is possible that slight differences can occur across the various tick servers, resulting in a trade being moved up to the next bar.
      ************************************
      Reply:
      I can see why/how this should affect time bars but I do not understand why this should affect 50T tick bars. At least the OHLCV vals on a tick chart should be the same from different tick receiver/server to different tick receiver/server as long as the ticks timestamped by each receiver/server are in the same order, even if they have different timestamp values. Each 50T tick bar should fill up on the fiftieth tick, regardless of the timestamp on that tick.

      If this were the case, then we might notice different timestamp values for tick bars from different servers, but their OHLCV values would all be consistent.

      This is not what happens. The data discrepancies I posted include OHLCV differences within < 20 50T tick bars on two different data/chart downloads that seem to be synchronized perfectly on bars 1-10.


      **FAQ*******************************
      There's an additional factor to keep in mind as well. When you initially request a new chart, the data is supplied from our Tick Server and loaded onto your desktop. From that point forward, data is then streamed in from our Network and cached locally on your system. When you "refresh", all of the data on that chart is re-sent by our Tick Server and the cached data is cleared. If your PC clock is slightly fast or slow, this could also result in slight differences after a refresh.
      ************************************
      Reply:
      I do not understand what you are saying here. When I refresh a chart, I understand that I will get new data from your servers. But at least this historical data that you are sending to me should have a timestamp associated with it. And if it does, my data should not change from one refresh to the next based on any value of my local PC clock.

      However, if you guys are letting code on my local machine timestamp historical data based on a most recent tick whose timestamp is dependent on my local PC clock, or on some other method dependent on my local PC clock, then this is ludicrous!

      I sympathize with your attempts to reduce the load on and thus speed up your tick servers, but obviously, any solution that causes a loss of data integrity is no solution.

      I don't want fast data that is wrong or that changes from one refresh to another. You guys need to timestamp your data before you send it to me regardless of how you handle timestamping of data that you receive from the exchanges and then process BEFORE you send it to me, the end-user.


      **FAQ*******************************
      These two issues explain fairly small differences in bar-type charts. If you see significant differences, you should contact Technical Support.
      ************************************
      Reply:
      Your above comments indicate that there are two possible sources of error:

      1) Your tick receiver/servers' PC clocks might not agree with one another. This disagreement could result in the same tick receiving a different timestamp on different eSignal tick receiver/servers. This could result in data discrpancies between data downloaded from different eSignal tick servers.

      I understand how this can be a difficult problem to solve for real-time data expressed in time-defined bars, i.e. 1 minute bars, 5 minute bars, etc.

      This problem is somewhat more tractable for tick data. If you cannot get timestamps to match up, at least get tick bars to match up on OHLCV vals by keeping tick counts that are based only on the number and order of ticks received from the exchange and not on the timestamp associated with each tick. This will also allow indicator vals based on OHLCV vals of tick data to be consistent.

      2) Esignal assigns timestamping duties to code located locally on each users machine. This might result in data discrepancies due to PC clock drift on each users machine.

      eSignal should not assign timestamping duties to local machines. This bound to compromise data integrity, without fail.


      **FAQ*******************************
      We continue to investigate the possibility of altering our current network configuration so that we would take the time-stamp SENT by most exchanges. This would increase the size of each record, increase our bandwidth usage and increase the load on our servers because we are talking about several million records each day. This project is still under consideration.
      ************************************
      Reply:
      This seems like the simplest and most obvious solution. I am sorry to hear that you guys didn't do this in the first place. I bet you spend/spent more on the programming solutions you trt/tried to implement than you would have spent on lots of new fast servers to spread out the load and provide good, reliable real-time data service to your paying customers.


      ** FAQ******************************
      Update: We are implementing an interim solution that will allow all of our tick servers to synchronize to a single network timestamp. This should greatly enhance the consistency of our intraday history data across our server farms. This solution should be completed by mid to Mid-Feb, 2004.
      ************************************
      Reply:
      I'm no networking expert, but the very thought of synching to a network timestamp seems problematic. The inconsistent tick data I am downloading seems to confirm this fear.

      ----------------------------------------------------------------------------------

      I don't even know for sure that these 2 separate 10Day, 50T tick chart data downloads that I performed at different times on 4/13/2004 came from different servers. I sure hope they did, but maybe they didn't.

      Obviously, I can't know what is going on without looking at your code, but these are just a few of my thoughts. Whatever is going on does seem to be happening in spite of the fact that both chart/data exports/downloads do seem to share a perfectly synchronized starting point/tick.

      The data posted is in perfect synch until bar 11 where only the volume differs. This probably indicates that a tick got placed in one bar on one tick receiver/server while it got placed in the another bar on a different tick receiver/server.

      From then on, the tick shifting/discrepancies continue and show up throughout the remainder of the two data/chart downloads and on the charts themselves as viewed after logging in to Esignal at different times.

      Again, why discrepancies should occur between tick bars on different servers that should only complete after 50 ticks have been recorded in them, regardless of the timestamp values associated with the ticks the bars contain, is up for grabs.

      Assuming this is a problem with recording that is due to discrepancies between PC clocks on different tick servers, this is happening in the middle of the night when there is little traffic for these particular markets. Still, I suppose eSignal's servers might be plenty busy recording and distributing data for international markets at this time....

      These download inconsistencies might indicate some sort of counting discrepancy between different tick servers or something. I really don't know.

      I'm sure you guys have all kinds of data validation routines that you can look into. This bug is in there somewhere if its not actually a result of your entire approach to the problem of data collection and distrubution.

      Thanks for your quick reply, and I hope you guys can get this resolved very soon.

      This problem makes backtesting and even trading with tick data quite problematic since your data for a given day changes fairly drastically depending on when you log in to Esignal.


      Braden

      Comment


      • #4
        Hi Cyprian,

        We did some further research on the recent implementation of the Server Synchronization feature implemented on our servers. This change forces every servers' system clock to be sync'd with the rest of the servers... so every minute they will all be exactly in line with each other. As time progresses from one sync to the next, there is a very small chance that their system clocks could fall out of sync by a millisecond or two, and thus affecting the time stamps of the trades on that server. When the next minute rolls around, the servers will sync again, and the cycle begins again.

        When looking at an interval based on minutes, the servers should all align, however when looking at an non-minute interval (i.e. 50t, 10s, etc.), then there is a chance that you will see discrepancies between servers... even with the Time Synchronization steps we are taking.

        We are continually striving to insure that the quality of our data is accurate and timely. In order to have a service that has all servers perfectly sync'd, you're likely looking at a minimum three to four times the cost of eSignal due to the extra overhead that that process takes.

        Comment


        • #5
          Timestamp issues with tick data

          Sorry to be so longwinded, but as I read over your FAQ again, it seems that eSignal is being extremely careless about timestamping their tick data in efforts to increase speed.

          **FAQ*******************************
          I'm comparing eSignal on two different computers and I'm noticing very slight differences in 1 minute candlesticks. What would account for these differences? I also notice changes when I refresh my own chart?

          We time-stamp the quotes as they are received from the exchanges. These quotes are then sent to our tick servers to build our database for tick and interval charts. Records are processed within milliseconds of each other. Even so, it is possible that slight differences can occur across the various tick servers, resulting in a trade being moved up to the next bar.

          There's an additional factor to keep in mind as well. When you initially request a new chart, the data is supplied from our Tick Server and loaded onto your desktop. From that point forward, data is then streamed in from our Network and cached locally on your system. When you "refresh", all of the data on that chart is re-sent by our Tick Server and the cached data is cleared. If your PC clock is slightly fast or slow, this could also result in slight differences after a refresh.

          We continue to investigate the possibility of altering our current network configuration so that we would take the time-stamp SENT by most exchanges. This would increase the size of each record, increase our bandwidth usage and increase the load on our servers because we are talking about several million records each day. This project is still under consideration.
          ************************************

          From the above FAQ I gather, eSignal first ignores the timestamp for each tick sent by the exchanges in order to reduce load on their servers and increase speed.

          I'm hoping that eSignal then bottlenecks its process by giving each tick received from the exchange an agreed upon eSignal timestamp before it is passed along in the process to another machine. Assuming that more than one eSignal machine is receiving quotes from the exchange, these machines should have some algorithm to make their tick timestamps agree before the tick, with timestamp, is sent to any other machine for further distribution and/or storage.

          If this process is adhered to, it makes no sense that separate tick servers that recieve info at different times in the future should move ticks into different bars.

          Since the FAQ says that tick servers might assign ticks to different bars, I assume the ticks it receives have no timestamps. Maybe a bit of strategic coding at this level could at least make eSignal's data consistent with itself, even if it disagreed slightly with what the exchange actually reported.

          But it gets even worse, not only might eSignal's different tick servers disagree, in efforts to decrease server load and increase speed again, eSignal seems to send ticks to users with no timestamps. Code run on the user's local machine, which depends on the user's local PC clock, is then used to timestamp an incoming tick.

          I just can't believe how LUDICROUS this is!! Only with this level of insanity does it become possible for a person's chart to change between separate refreshes!!

          EVERY TICK MY LOCAL MACHINE RECEIVES SHOULD HAVE A TIMESTAMP ASSSOCIATED WITH IT THAT IS GENERATED AND AGREED UPON BY ESIGNAL BEFORE IT IS SENT TO ME, THE END-USER. ESIGNAL SHOULD NEVER ALLOW MY LOCAL MACHINE TO TIMESTAMP ANY TICK IT RECEIVES!

          ESIGNAL SHOULD ALSO NOT ALLOW MULTIPLE TICK SERVERS TO TIMESTAMP TICKS DIFFERENTLY ACCORDING TO THEIR LOCAL PC CLOCK VALS THAT MAY BE DIFFERENT! EVERY TICK SENT FROM A QUOTE RECEIVING MACHINE CONNECTED DIRECTLY TO THE EXCHANGE SHOULD HAVE AN AGREED UPON TIMESTAMP ASSOCIATED WITH IT BEFORE IT IS RECEIVED BY ANOTHER ESIGNAL MACHINE THAT IS USED FOR DATA STORAGE AND/OR DISTRIBUTION/SERVING!

          THE SAME PROBLEM IS REARING ITS UGLY HEAD IN 2 PLACES!!

          (1) WHEN ESIGNAL'S TICK RECEIVERS SEND TICKS WITH NO TIMESTAMPS TO TICK SERVERS, DATA DISCREPANCIES ARE GENERATED!!

          (2) WHEN ESIGNAL'S TICK SERVERS SEND TICKS WITH NO TIMESTAMPS TO MY LOCAL MACHINE, MORE DATA DISCREPANCIES ARE GENERATED!!

          I can't believe eSignal has actually CREATED this problem for itself!! I'm sure the exchanges have gone to great lengths, computationally, to insure data integrity across all their tick servers before they send out their data.

          Esignal should just use the timestamps and/or tick numbers provided by the exchanges themselves and avoid all this hassle!!

          It will be cheaper for eSignal to buy a bunch of new servers, routers, etc. and transmit slightly larger packets (THAT INCLUDE A TIMESTAMP FOR EACH TICK) way fast and in parallel than it will be for them to program their way out of this mess!!

          Braden

          Comment


          • #6
            TICK DATA OHLCV integrity!!

            Duane,

            Thanks for your diligence and speed in replying to my posts!! :-)

            (Your most recent post went up while I was still composing my most recent post! :-)

            I still don't understand why eSignal can't do the right thing about tick data with little or no overhead.

            Like I have said previously, each tick bar should fill up only after the given number of ticks to be displayed in each bar have completed.

            Tick bars should not be dependent on any time-based description of a tick. Only the order in which ticks are received should matter.

            Why should OHLCV vals of 50T tick bars not be consistent even when you cannot get the timestamps for these ticks to align properly?

            Unless your data integrity is extremely bad, resulting in ticks being received in a different ORDER from one tick server to the next, then consistent OHLCV vals for tick bars should be attainable so long as an accurate local tick count is kept on each tick server.

            Unless timestamps are used to determine inclusion in a tick bar, which cannot be the case given the timestamp discrepancies in tick data I have downloaded, your synchronization attempts have not even addressed the issues with TICK DATA integrity that I have presented to you.

            If you simply kept accurate local tick counts on each tick server, the different tick data downloads that I posted should not get out of alignment after being completely in-synch for the first 10 50T tick bars. (Again, this assumes that each tick server receives all ticks in the correct ORDER. If at least this is not happening, I would definitely like to be informed of this!!)

            I know this would require a couple of extra lines of code to generate tick numbers and it would require an extra field in your database tables in which you record ticks for each market, but I have a hard time believing this would triple or quadruple the cost of doing business for eSignal.

            I'm sure most users of tick data would prefer to have tick bars whose OHLCV are consistent across servers even if their timestamps are not.

            Please respond to me about this as I don't think this solution would require lots of programming or much overhead in terms of CPU processing, storage, etc.

            Thanks.

            Braden

            Comment


            • #7
              A more detailed question...

              As I thought about things a bit more, there are of course more efficient methods of tagging ticks that would not require you to record a new field (i.e. for a tick number) for each tick. You could tag every 1000th tick, etc. ... I'm sure your programmers, DB guys, whoever, can come up with plenty of great and efficient ways to keep tick counts in ways that will greatly improve data integrity in tick bars from the current state of eSignal.

              Here is the more detailed question...

              If this is already being done and the problem is that ticks are being received in a different ORDER on different tick servers, and/or if different tick servers receive different sequences entirely (e.g. different numbers of ticks on different servers as determined by counting from an identifiable agreeing tick that both servers do share, etc.) I would like to know this.

              This would be a much more severe problem than just two tick servers disagreeing about the timestamps for a matching number of ticks on both servers.

              Hopefully, you guys will be able to attain constancy or near-constancy of OHLCV vals for tick bars across tick servers even if the timestamps of these tick bars across tick servers do not quite agree due to minor drifting of tick server PC clocks between 1 minute synchs.

              Braden

              Comment

              Working...
              X