Announcement

Collapse
No announcement yet.

Question on eSignal’s historical data, missing data.

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question on eSignal’s historical data, missing data.

    After running the same simulation, using the same data set and getting different results, I found out the problem. eSignal’s historical data for the E-mini S&P 500 (ES) is corrupted.

    To prove it, I downloaded today data for the (ES) from 00:00:00 to 15:59:59 twice, just minutes apart, using eSignal Tick Downloader. The two files downloaded should be exactly the same, but they are not. One file is missing data. Since the two files are too big to upload them to the forum, I uploaded screens from KDiff3 (file comparing software). There are several differences between the two files, but the forum only allows me to upload one image.

    I would like to ask eSignal, can we trust your data?
    Attached Files

  • #2
    While your screen shot shows a difference, there's not enough information here to give you a definitive answer. In this case, it appears the data is off by .013% or matches 99.987% With the amount of data being processed across multiple links, in various locations and off multiple servers, a goal of 100% match is highly unlikely. If we aimed for 99.999% matching, that would be a difference of 60 ticks on this day for this symbol.

    In order to specifically troubleshoot this, we'd need to know what servers you were connected to when you dl'd this data. If you want to run the test again, please ping us on LiveRep or call us so we can check what server you're on and we can take this further.

    Thanks.

    Comment


    • #3
      Scott,

      The difference is much more than .013%.

      The two files are:
      ES-120208.EPF 11,545 KB
      ES-120208B.EPF 12,195 KB

      Just comparing the file size the difference is 5.33%.

      Also, is there a faster way to know to which server I am connected to? Something like Command Prompt: Netstat. Contacting Customer Support takes forever.

      You wrote: "...a goal of 100% match is highly unlikely..." but downloading Historical Tick Data is data that is already stored in your servers. It is not RealTime data. If this is the case, why we can't expect 100% accuracy? This is like downloading a picture from the internet and getting a different incomplete file each time.

      Thanks.

      Comment


      • #4
        I can't speak to the differences in file sizes but if you look at the # of records, there is indeed a difference of just .013% and when comparing data, we traditionally look at the # of records, not file sizes.

        I'm not aware of any external way to determine what tick server you are hitting (there are over 60 of them between our two main data centers on each US coast) so you'll need to ping us via LiveRep or call us so we can check that.

        The data stored is collected in real-time and hence should match what you see in "real-time". We don't have a master server that all servers are sync'd off so each tick server collects data on it's own, hence the possibility for some minor deltas. By collecting data the way we do, the upside is redundancy, speed and scalability but we do acknowledge that real-time collection of data across multiple locations, circuits, servers, etc can cause small differences in record counts. Our informal goal would be 99.99% matching. We would need a different network structure and collection methodology if our goal was 100% matching across all server types.

        Hope that helps.

        Comment


        • #5
          Scott,

          Thank you for your prompt response.

          The image that I posted was only a small sample of the differences between the two files. There is a restriction on the maximum file size that I can post. Although you are correct and we should compare number of records, since the EPF file is just a text file, comparing file size give you the same result.

          Numbers of records for exactly the same period of time:

          File ES-120208.EPF 452467
          File ES-120208B.EPF 477915

          A difference of 25448 records, or 5.3%

          I think your network configuration is flawed. Each server collecting its own data is like a bank with a independent database on each bank branch. Making the balance on your checking account different, depending on which branch you go.

          As a solutions, I contacted CME directly and will buy the historical data from them. I shouldn't have to do that, since I am already paying for the data through eSignal.

          Thank you.

          Comment


          • #6
            That gap is way more than we typically see but without looking at the files themselves and what servers you were on, it's hard to diagnose further.

            In terms of our set-up, I wouldn't use a bank for the analogy but we tend to pick what works for our point of view, don't we?

            There are many positives to being able to scale with speed and redundancy but keeping every server in lockstep is not easy when your dealing with over 1-2 million records a second. It's a worthy technical theory debate.

            Thanks.

            Comment


            • #7
              To identify the tick server one is connected with, one can use... netstat -n
              The tick server is the IP associated with Port 2192.
              One way to get the command window shown in the screenshot is...
              Taskbar Start > select Run from the menu > and type in CMD

              Here's an article detailing the eSignal remote Port assignments and the types of data associated with each.
              http://kb.esignalcentral.com/article...ticle=1327&p=1

              Here's a partial paste from the aforementioned article.
              Port 2189 - Connection Manager and Financial Quotes Server (required)
              Port 2190 - News Server (required for News access)
              Port 2192 - Intraday History Server (required for intraday and tick data)
              Port 2193 - International Tick Server (required for International Intraday history data)
              Port 2194 - Daily History Server (required for daily historical data)
              Port 2196 - Market Depth Server (required for Market Depth data)

              The 2189 Port is the one the Data Manager (winros.exe) uses.
              The others are used directly by eSignal (winsig.exe).

              LAM
              Attached Files

              Comment

              Working...
              X