1. Introduction
The daily trading volume of the foreign exchange market exceeds 5 trillion US dollars, presenting immense opportunities and risks. Accurate price prediction is crucial for formulating effective trading strategies. However, forex data is characterized by high volatility, significant noise, and complex nonlinear patterns, making prediction extremely challenging. Traditional linear models (such as ARIMA) often struggle to capture these dynamic features. This paper proposes a novel hybrid method that synergistically combinesWavelet Denoising、Attention-based Recurrent Neural Network (ARNN)和Autoregressive Integrated Moving Average (ARIMA)models, aiming to separately address the linear and nonlinear components in forex time series to achieve superior predictive performance.
2. Related Literature
2.1 Wavelet Denoising
Wavelet transform is a powerful time-frequency analysis tool that effectively separates signal from noise in non-stationary financial data. By decomposing a time series into approximation and detail coefficients, high-frequency noise components that may obscure underlying trends and autocorrelation structures can be selectively removed. This preprocessing step is crucial for improving the quality of model inputs.
2.2 Neural Networks in Finance
Neural networks, particularly Recurrent Neural Networks (RNN) and their variants (e.g., LSTM), have shown potential in modeling complex nonlinear financial time series. The integration of attention mechanisms, as demonstrated by the Transformer model, enables the network to focus on historical observations most relevant for prediction, thereby enhancing sequence modeling capabilities.
2.3 Hybrid Forecasting Models
The "decomposition and integration" paradigm has been widely recognized. Its core idea is to use different models to capture distinct data characteristics (e.g., linear vs. nonlinear, trend vs. seasonality) and then combine their forecasts. The contribution of this paper lies in the specific combination of wavelet denoising for preprocessing, ARNN for modeling nonlinear patterns, and ARIMA for handling residual linear components.
3. Methodology
3.1 Data Preprocessing and Wavelet Denoising
The original foreign exchange price series $P_t$ is decomposed using the Discrete Wavelet Transform (DWT): $P_t = A_J + \sum_{j=1}^{J} D_j$, where $A_J$ is the approximation coefficient (low-frequency trend), and $D_j$ is the detail coefficient (high-frequency noise at the $j$-th level). A threshold function (e.g., soft thresholding) is applied to the detail coefficients to suppress noise, followed by reconstruction to obtain the denoised series $\tilde{P}_t$.
3.2 Attention-based RNN (ARNN) Architecture
The model employs an encoder-decoder RNN framework with an attention layer. The encoder (LSTM units) processes the input sequence $\tilde{P}_{t-n:t-1}$ and generates a sequence of hidden states $h_i$. The attention mechanism computes a context vector $c_t$ as a weighted sum of these encoder states: $c_t = \sum_{i=1}^{n} \alpha_{t,i} h_i$, where the attention weights $\alpha_{t,i}$ are learned by a feedforward network. The decoder LSTM then uses $c_t$ and its previous state to predict the nonlinear component $\hat{N}_t$.
3.3 ARIMA Model Specification
The ARIMA(p,d,q) model fits the linear relationships in the time series. After the ARNN captures the nonlinear component, the residual sequence $R_t = \tilde{P}_t - \hat{N}_t$ is modeled by ARIMA: $\phi(B)(1-B)^d R_t = \theta(B) \epsilon_t$, where $\phi$ and $\theta$ are the AR and MA polynomials, $B$ is the backshift operator, $d$ is the differencing order, and $\epsilon_t$ is white noise. This yields the linear prediction $\hat{L}_t$.
3.4 Hybrid Ensemble Strategy
The final prediction $\hat{P}_t$ is a simple additive combination of the predictions from the two component models: $\hat{P}_t = \hat{N}_t + \hat{L}_t$. This assumes that the linear and nonlinear components are additive and have been effectively separated through the modeling process.
Core Performance Metrics
1.65
Root Mean Squared Error
Directional Accuracy
~76%
Prediction Success Rate
Foreign Exchange Market Size
>$5万亿
Daily Trading Volume
4. Experimental Results
4.1 Dataset and Experimental Setup
The experiment is based on high-frequencyUSD/JPY five-minute exchange rate dataThe dataset is divided into training set, validation set, and test set. Baseline models for comparison include the standalone ARIMA model, the standard LSTM model, and other neural network architectures from the relevant literature.
4.2 Performance Metrics and Comparison
The proposed hybrid model achievesRMS error (RMSE) is 1.65,Directional accuracy (DA) is approximately 76%This outperforms all baseline models. For example, a standalone ARIMA model may achieve around 55-60% DA, while a standard LSTM may reach about 65-70%, highlighting the value of hybrid methods and preprocessing.
4.3 Result Analysis and Discussion
The significant improvement in directional accuracy is particularly noteworthy for trading applications, where predicting the correct direction of price movement (up/down) is often more critical than the exact price point. The reduction in RMSE indicates that overall prediction error has been minimized. The results validate the hypothesis that wavelet denoising stabilizes the input data and that the hybrid model effectively captures both linear and nonlinear dependencies.
5. Technical Analysis and Expert Insights
Core Insights
This paper is not merely another "AI for Finance" project; it is a shrewd engineering practice that recognizes a fundamental truth: financial markets aremulti-mechanismsystems. They are neither purely chaotic nor completely predictable; they oscillate between periods of trend-following (capturable by linear models) and complex, news-driven shock periods (requiring nonlinear models). The author's core insight is toforce the architecture to explicitly model this duality, rather than hoping a single monolithic network will resolve it on its own.
Logical Flow
The process logic is elegant: 1) Clean the signal(Wavelet denoising): This is essential. Feeding raw, noisy high-frequency data into any model is asking for trouble, as noise will dominate the gradient. Using wavelets is superior to a simple moving average because it preserves local features. 2) Divide and conquer(ARNN handles nonlinearity, ARIMA handles linearity): This is a stroke of genius. It follows the"No Free Lunch" theoremprinciple in machine learning — no single model is optimal for all problems. Let the specialized tool (ARIMA) handle the easily understood linear autocorrelation, thereby freeing the powerful but data-hungry ARNN to focus on deciphering complex nonlinear patterns. 3) Recombine(Additive ensemble): Simple summation is effective, provided the captured components are assumed to be orthogonal.
Strengths and weaknesses
Fa'ida:Wannan hanya ta ɗan kasanceMai karewa da bayyanawa. Kana iya duba ragowar ARIMA da ma'aunin hankali na ARNN. Ayyukanta (wanda ya kai 76% DA akan bayanan musayar kuɗi na minti 5) yana da ma'ana a aikace, kuma ya zarce ma'auni na gama-gari. Wannan tsari ne mai ƙarfi wanda za a iya amfani da shi a kan kowane jerin bayanai masu hayaniya, marasa tsayayye (kamar cryptocurrency, kayayyaki masu sauyin yanayi) ban da musayar kuɗi.
Rashi da Manyan Lalacewa:Wata matsala bayyananniya ita ceRashin ainihin kwaikwayon cinikiHigh DA and low RMSE on the test set do not equate to profitability. Within a 5-minute time window, transaction costs, slippage, and latency can completely offset paper returns. The model is purely technical and ignores macroeconomic news flow or order book data—a serious limitation in today's algorithmic trading environment. Furthermore, additive combination is too simplistic; a learned weighting mechanism (e.g., a gating network) that dynamically adjusts each model's contribution based on market regimes is an approach hinted at in meta-learning research by institutions likeDeepMindand others.
Actionable Insights
For quantitative analysts and asset managers:Replicate, then extend. Use this architecture as your new baseline. The immediate next steps are: 1) Integrate alternative data: Input the embedding vectors from real-time news sentiment analysis (using models such as FinBERT) together with price data into the ARNN encoder. Implement dynamic weighting: Replace the fixed $\hat{N}_t + \hat{L}_t$ with $w_t \hat{N}_t + (1-w_t)\hat{L}_t$, where $w_t$ is a small neural network that predicts the current market's degree of "nonlinearity". Backtesting with friction: Feed the predictions into a realistic backtesting engine that includes costs. The true value of 76% of DA can only be seen in this context. This paper provides the engine module; the industry must now build the other parts of the trading toolkit around it.
6. Analytical Framework and Case Examples
Scenario: Predict the next five-minute candlestick for EUR/USD during major central bank announcements (e.g., the ECB press conference).
Framework application:
- Wavelet preprocessing: The original 5-minute price series over the past 4 hours (48 data points) is decomposed. The high-frequency "detail" coefficients that spike during announcement periods are heavily thresholded, thereby smoothing micro-noise while preserving major directional jumps.
- Model decomposition:
- ARIMA component: For the news releasepriorYana yin ƙirar yuwuwar motsi da kuma yanayin komawa zuwa matsakaici. Hasashensa na iya zama ɗan ci gaba da yanayin da ya kasance kafin labarai.
- Abubuwan da ke tattare da ARNN: Tsarin hankali yana mai da hankali sosai kan sandunan farashi na baya-bayan nan, masu tashin hankali bayan sanarwa. Yana koyo daga irin wannan "yanayin girgizar labarai" na tarihi, don hasashen yiwuwar wuce gona da iri na ɗan gajeren lokaci da kuma komawar da zai biyo baya.
- Hasashen gauraye: Hasashen ƙarshe = (Hasashen ARIMA na tushen yanayi) + (gyaran tasirin labarai na ARNN). Wannan ya fi kowane samfuri guda ɗaya zurfin fahimta, wanda zai iya rashin amsawa sosai (ARIMA) ko kuma yin overfitting ga hayaniya (RNN na yau da kullun bisa bayanan asali).
7. Future Applications and Directions
- Hasashen kadarori da yawa da kasuwanni daban-daban: Faɗa da wannan tsarin don yin samfurin alaƙa tsakanin kuɗaɗen waje, hannun jari, da lamuni. Mai rikodin ARNN na iya sarrafa jerin lokuta masu alaƙa da yawa a lokaci guda.
- Haɗin kai da Ƙarfafa Koyo (RL): Yi amfani da hasashen samfurin gauraye a matsayin wakilcin yanayi na wakili na RL, wanda ke koyon mafi kyawun dabarun aiwatar da ciniki, yana inganta riba kai tsaye maimakon kuskuren hasashen.
- Haɓaka Ilimin Hange na Wucin gadi (XAI): Haɓaka hanyoyin da za a danganta hasashen ƙarshe ga takamaiman yanayin layi (ta hanyar ƙididdigar ARIMA) da takamaiman wuraren lokaci na baya (ta hanyar taswirar hankali na ARNN), don ba wa ’yan kasuwa dalilai masu amfani na hasashen.
- Koyo na kan layi mai daidaitawa: Aiwatar da hanyoyin da za su ba da damar samfurin ya ci gaba da sabunta sigoginsa da sabbin bayanai a hanyar kwarara, don daidaitawa da yanayin kasuwa masu canzawa, wuce tsarin gwajin horo na tsaye.
8. References
- Bank for International Settlements (BIS). (2019). Triennial Central Bank Survey of foreign exchange and OTC derivatives markets.
- Box, G. E. P., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2015). Time series analysis: forecasting and control. John Wiley & Sons.
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
- Zhang, G. P. (2003). Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing, 50, 159-175.
- Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
- Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization. IEEE transactions on evolutionary computation, 1(1), 67-82.
- DeepMind. (2023). Research in Adaptive Agents. Retrieved from https://www.deepmind.com/research/highlighted-research/adaptive-agents