1 (edited by Popov 2024-12-25 11:52:47)

Topic: Supercharging the Data Export Script: From 5 Minutes to 7 Seconds!

Hey guys,

As many of you know I am working with some pretty hefty historical datasets. When using Popov's excellent MT4 Data Export script to export data to Expert Advisor Studio, it was always taking very long. I'm talking about a good 5 minutes just to get the M15 (900000 bars), M30 (450000 bars), and H1 (225000 bars) timeframes exported on my 37 years of historical data.

Now, I don't know about you, but I'm all about efficiency, so I decided to take a look under the hood and see if I could give the script a bit of a boost.

The main bottleneck was the way the script was building the JSON strings using the StringAdd function. Every time StringAdd is called, MT4 needs to resize the string's memory, which takes a lot of time, especially when dealing with massive amounts of data like in my case.

So, what did I do? I implemented a much faster approach: pre-allocating memory for the strings using uchar arrays and copying the characters one by one. This way, we avoid all those unnecessary memory reallocations, and the script can just focus on getting the job done quickly.

I also added some error handling to make the script even more robust and user-friendly. Now it checks if the data is retrieved correctly, and if the files are opened and written properly.

The bottom line: the modified script takes just 7 seconds to export the same dataset that took 5 minutes before. That's a whopping 40x speed improvement! I'm making this modification freely available so that everyone can benefit from this turbocharged version and so that Mr. Popov can possibly implement it to future EA Studio releases directly.

I hope this optimization helps everyone streamline their data export workflow.

Cheers,

Lorenz

Edit (Popov): See the scripts attached below

Re: Supercharging the Data Export Script: From 5 Minutes to 7 Seconds!

This is nice! In coding I find it most satisfying making something perform quicker, so much quicker that you could almost physically feel it, this goes into that category smile I making it a sticky topic for the time being.

Re: Supercharging the Data Export Script: From 5 Minutes to 7 Seconds!

I do something different but in MT5 I leave only the pairs I want in the market watch and it exports all automatically as I work with the spreed of 30 I leave in the watch of mr always the spreed of 30

Re: Supercharging the Data Export Script: From 5 Minutes to 7 Seconds!

aqui esta o arquivo usado

Re: Supercharging the Data Export Script: From 5 Minutes to 7 Seconds!

footon wrote:

This is nice! In coding I find it most satisfying making something perform quicker, so much quicker that you could almost physically feel it, this goes into that category smile I making it a sticky topic for the time being.

Thanks for the sticky, footon! I totally agree, the feeling of making something this much faster is incredibly satisfying time and time again. Especially when you're working with datasets as large as mine – 28 pairs over 37 years. What used to take me over 2 hours each week, now finishes in 4 minutes! A real game-changer for me in my workflow. Hope it helps others too.

Re: Supercharging the Data Export Script: From 5 Minutes to 7 Seconds!

GeekTrader,
Can you explain the reasoning for this estimation?

   // Estimate string sizes and pre-allocate 
   const int estimatedSize = bars * (15 + digits + 2);

I see it as:
- 15 chars before the decimal point
- "digits" chars after the decimal point
- 2 chars: 1 for the decimal point and 1 for the comma.

Is it correct, or am I missing something?

My concern is for the hardcoded 15.
I assume it would be more efficient if we knew the exact number more precisely.
I'll do some tests.

Thank you again for the excellent approach to the problem. I never thought about it because I'm exporting the data in binary format for the EA Studio data feeds.

Re: Supercharging the Data Export Script: From 5 Minutes to 7 Seconds!

Here are versions of the scripts with code cleanup.

Also:
- optimised the size of the intermediate char arrays.
- fixed the code for finding the data starting bar.
- the script shows the date of the data starting bar



https://image-holder.forexsb.com/store/data-export-script-optimised.png

Please test the scripts.
I'll attach them to EA Studio if everything is fine.

Happy Christmas!

Post's attachments

data-export.mq4 15.65 kb, 1 downloads since 2024-12-24 

data-export.mq5 17.23 kb, 3 downloads since 2024-12-24 

You don't have the permssions to download the attachments of this post.

Re: Supercharging the Data Export Script: From 5 Minutes to 7 Seconds!

Hello Popov,

Thank you for your detailed analysis and improvements! Let me explain the reasoning behind my approach and acknowledge where your version makes significant improvements:

1. About the "15" estimation:
You were correct - it was a rough estimation for the maximum expected digits before the decimal point in price values. Your approach of calculating exact sizes by finding maximum values first is much more precise and professional. It prevents any potential buffer overflows and is the proper enterprise-level solution.

2. Array Size Optimization:
Your implementation with GetCountOfDigits() and pre-calculation of exact array sizes is excellent. While my estimation approach might be slightly faster by avoiding the initial data pass, your solution provides guaranteed safety and precision, which is more important for production code.

3. FromDate Handling:
Your while-loop approach for calculating the FromDate bar index is more elegant than my solution. It provides better clarity and reliability.

4. Additional Improvements in Your Version:
- Adding the start date to the output is very useful for users
- Better error handling for bar count
- More comprehensive code structure and comments
- Consistent coding style throughout

Would you consider one potential optimization? For very large datasets, we could potentially cache the maximum values during the initial CopyRates to avoid the extra loop, though this would make the code more complex.

Your version strikes an excellent balance between performance, safety, and code quality. I appreciate how you improved upon my initial optimization attempt while maintaining production-grade reliability. I´ve removed my version to avoid confusion for new users, but can´t do so for the first post anymore (can´t edit it at this stage).

Thank you :-)