Hualin Luan Cloud Native · Quant Trading · AI Engineering
Back to articles

Article

Record of Quantitative Trading System Development (6): Architecture Evolution and Reconstruction Decisions

Review the five refactorings of Micang Trader, explaining how the system evolved from the initial snapshot to a clearer target architecture, and incorporated technical debt and ADR decisions into long-term governance.

Meta

Published

3/31/2026

Category

guide

Reading Time

62 min read

Readers can regard this article as a review of architecture evolution and technical debt: the five refactorings are not for the pursuit of formal “more elegance”, but a systematic response to real defects, performance bottlenecks, test pressure and collaboration costs.

Series reading order

Part1 -> Part2 -> Part3 -> Part4 -> Part5 -> Part6 -> Part7. Part 6 is placed after the performance chapter because refactoring is not an abstract aesthetic, but a systematic response to real defects, test pressure, and performance bottlenecks.

The following five refactorings revolve around the same issue: when a trading system moves from running to maintainable, verifiable, and scalable, which boundaries must be redrawn and which technical debt must be explicitly managed. Readers do not need to remember all the class names and implementation details first, but only need to grasp one main line: every refactoring should be based on real risks, change the boundaries in a verifiable way, and record the costs of the new boundaries.


How readers understand these five reconstructions

In the trading system, refactoring is not about making the catalog more beautiful, but about making the system easier to reason under real pressure: when the market continues to enter, the K-line attribution must be stable; when the indicators continue to update, the results must be verifiable; when the user drags the chart, the interface cannot be overwhelmed by historical data; when backtesting, real-time monitoring and data loading are running at the same time, a failure in one execution domain cannot take away the entire set of terminals.

When understanding these five refactorings, the most important thing is not to remember all the class names first, but to first see the sequence relationship between them. When the data boundaries are clear, testing can bypass the GUI and directly verify transaction semantics; when the computing status is clear, performance optimization does not involve blindly stacking hardware; when chart responsibilities are separated, virtual rendering will not continue to be piled in a God class; when the execution domain is isolated, long-term management can have stable fault boundaries and review entrances.

Cause-and-effect timeline of five reconstructions of the quantitative trading system
Figure 1: Five refactoring causal timelines, from responsibility boundaries to execution domain isolation to long-term governance closed loops.

The first step must address data responsibilities. If ChartWidget is responsible for database access, K-line conversion, indicator calculation, graphics rendering and interactive processing at the same time, any small changes will pass through multiple levels: changing a data source will affect the UI, adjusting a trading period will affect the chart, and repairing an indicator defect may also change the window state. The first refactoring separates the data layer and UI layer. The real answer is “who owns the data semantics and who is only responsible for presentation”. If this problem is not solved, subsequent unit tests, backtest verifications and performance comparisons will lack a stable entrance.

The second step is to deal with the calculation status. Pandas full recalculation is very convenient when the sample is small, but when layered indicators, sliding windows and real-time incremental market conditions are stacked together, each new K line may trigger an unnecessary historical replay. The value of IncrementalMA does not lie in hand-writing a more flashy indicator class, but in changing the question to “When a new K line is added, the system only needs to update which minimum states”. Only when the computing state is controllable can performance optimization have a clear target, otherwise it will just be a waste of execution faster.

The third step is to separate the chart responsibilities. After the data layer is independent, the chart may still become a new monolith: when the data model, coordinate conversion, drawing logic, mouse interaction and indicator overlay are all crowded together, any demand will continue to expand the cognitive load of the component. The third reconstruction uses MVC ideas to separate the data model, renderer and interactive control, so that the charts can separately discuss “what the data is”, “how to draw it” and “how user operations change the view”. This step does not look like performance optimization, but it determines whether the subsequent rendering optimization can fall on the correct boundary.

The fourth step is the rendering cost. After the number of K-lines expands from a few thousand to a hundred thousand, the bottleneck will shift from data reading to the view layer. The solution of VirtualizedCandleRenderer is not as simple as “drawing faster”, but splits the complete data pool, visible window, buffer, off-screen cache and texture reuse into different concepts: what can be seen on the screen will be processed first; data that is temporarily unseen remains in the data layer and no longer participates in full redrawing repeatedly. It is similar to the counterpart of the front-end virtual list in the trading chart, except that the objects here are not ordinary DOM lines, but K lines, trading volumes, indicator lines and interactive markers.

The fifth step deals with the execution domain. When the UI, indicator calculation, backtesting and data loading are all placed in the same Python process, GIL, long tasks and blocking I/O will amplify each other, eventually manifesting as interface freezes, real disk monitoring delays or exceptions that are difficult to locate. The goal of multi-process and shared_memory is not to blindly pursue parallelism, but to split heavy calculations, shared data and UI responses into isolated failure domains: when the indicator calculation process crashes, the main interface should not lose response; when the backtest fills up the CPU, the real disk monitoring should not be slowed down; when the shared memory writes abnormally, the system should be able to locate which period of the data life cycle has the problem.

Finally, we have to deal with loss of control in decision-making. The larger the system, the easier it is to “write it temporarily like this” to become the most difficult debt to repay. The meaning of ADR, DEBT-* records and Code Review checklist is to retain the background, benefits, costs and review conditions of each architectural choice. They are not meant to prove that a certain choice is always correct, but to allow the next maintenance to answer three questions: why it was done then, whether conditions have changed today, and where to start if the boundaries need to be adjusted.

This causal line also explains why the reconstruction order cannot be interchanged at will. Loss of responsibility will make testing difficult; difficulty in testing will cause performance optimization to lack evidence of correctness; performance optimization lacking evidence will amplify the risk of refactoring; after the risk of refactoring increases, the team will continue to pile up debt with local patches; when the debt piles up to a certain extent, new features, defect repairs, and fault location will all slow down. On the other hand, the first refactoring establishes boundaries, the second refactoring controls computational complexity, the third refactoring removes view responsibilities, the fourth refactoring reduces rendering costs, the fifth refactoring isolates execution failures, and long-term governance precipitates these changes into a repeatable mechanism.

The following reading frame can help readers locate key points in long articles:

reading questionsCorresponding reconstructionJudgment criteriaEvidence that readers should pay attention to
Is the data semantics polluted by the UI?first refactoringWhether data access, transformation, and display are hierarchicalChartWidget Whether to exit the data owner role
Is the indicator slowed down by full double counting?Second reconstructionWhether the new K line only updates necessary statusIncrementalMA Whether to retain the minimum state
Can charting responsibilities evolve independently?The third reconstructionAre data models, renderers, and interactions separated?Is it easier to test after MVC split?
Is the chart with large data volume still smooth?The fourth reconstructionWhether the visible window replaces full drawingWhether VirtualizedCandleRenderer only handles viewport
Does recalculation affect UI and real disk monitoring?The fifth reconstructionAre process boundaries and shared memory clear?shared_memory Whether there are life cycle constraints for reading and writing
Can architectural choices be reviewed?debt governanceAre there ADR, DEBT and review conditions?Whether the decision can be understood by the next maintainer

If there is only one principle to remember, it is this: Refactoring a trading system should not start from “where the code is ugly”, but from “which boundaries are creating errors, delays or collaboration costs”. Code style can be solved through lint. Problems that really require architectural refactoring usually have three characteristics: it spans multiple modules, has caused testing, performance or operation and maintenance risks, and cannot be suppressed by local patches for a long time. Only when these signals appear simultaneously is refactoring worthy of formal decision-making.

This is why the five reconstructions must be viewed together with the evidence. After the data layer and UI are decoupled, tests need to be used to prove that the policy logic no longer relies on window objects; after incremental indicators replace Pandas and full recalculation, benchmarks need to be used to prove that the calculation time is reduced, and regression tests are used to prove that the results are consistent; after VirtualizedCandleRenderer is connected, interaction tests and frame rate data need to be used to prove that dragging is smooth; after the multi-process architecture is online, fault injection needs to be used to prove that the failure of the sub-process will not bring down the main interface. Without this evidence, refactoring is just a massive code move.

Looking at these five refactorings together, they look more like a maintenance mechanism than five isolated cases. The first refactoring makes data input trustworthy; the second refactoring makes computing states controllable; the third refactoring makes interface boundaries detachable; the fourth refactoring makes interactive performance measurable; and the fifth refactoring makes execution failures isolable. After these capabilities are established, the technical debt list, ADR, and Code Review checklist are not just documents, but operating systems that can continue to constrain system evolution. The real reliability of a long-running quantitative system is not that it is never reconstructed, but that every reconstruction has evidence, boundaries, rollbacks, and review conditions.

This main line can also help readers determine which stage their system is in: If data source switching still affects the UI, don’t talk about multi-process; if the indicator results have not yet been implemented as a reference, don’t rush to do incremental optimization; if the chart component is still a single entity, don’t treat virtual rendering as a silver bullet. The order of architecture evolution itself is part of risk control. The later optimizations are more dependent on the previous boundaries and tests.

In other words, the reconstruction order is not the layout order in the article, but the order in which system risks are disassembled layer by layer.

Technical debt itself is not scary. What is scary is not knowing where the debt is, why it is formed, and when it must be repaid. A trading system can accept short-term trade-offs, but not unrecorded trade-offs. Temporary code should correspond to DEBT-* records, architecture selection should correspond to ADR, performance optimization should correspond to benchmark, and interface changes should correspond to regression testing. This will make the early pace seem slower, but it can prevent the system from suddenly losing control under the pressure of real market conditions, real users and real funds.

We also need to see the cost of reconstruction at the same time. After the first split of the data layer, the life cycle of the data object must be redefined; after the second incremental indicator, state initialization and playback recovery must be more cautious; after the third split of the chart component, the event subscription and rendering refresh order need to be reorganized; after the fourth introduction of VirtualizedCandleRenderer, visible windows, cache invalidation and mouse interaction must have consistent protocols; after the fifth introduction of multi-process, serialization, shared memory release, exception propagation and log correlation will all become new governance objects. Benefits and costs must enter into judgment at the same time, otherwise refactoring can easily turn from risk control into a new source of complexity.

Therefore, every refactoring should have entry and exit conditions. Entry conditions include: the defect has reoccurred, local patches can only transfer the problem, test or performance data can prove the risk exists, and the team knows the cost of not refactoring. Exit conditions include: old behaviors are protected by regression testing, key indicators are compared before and after, abnormal paths have downgrade plans, new boundaries are recorded in documents, and maintainers can independently understand them next time. Without entry conditions, refactoring can easily turn into technology preference; without exit conditions, refactoring can easily turn into a borderless project.

If readers are maintaining their own quantitative system, they can directly use the following checklist to evaluate whether to enter refactoring:

  • Whether the problem has spanned more than two modules rather than a local flaw in a single function.
  • Are there any impacts that can be perceived by real users, such as UI freezes, misaligned indicators, inconsistencies in backtesting, or faults that are difficult to locate?
  • Whether there are already minimal reproductions, performance data or failed use cases, rather than just judging by reading the code.
  • Whether it can be replaced in stages without changing external behavior and preserving the rollback path.
  • Whether the refactoring products can be written into ADR, DEBT list, Code Review checklist and test plan.
  • Has the cost of not refactoring been clarified, including the scope of changes to new features in the future, testing costs, and the probability of accidents?

The value of this list is to turn “should it be refactored” from a subjective debate into an evidence-driven judgment. The trading system especially requires this kind of discipline, because it does not face one-time page delivery, but long-term operation, continuous iteration and real financial risk. Any architectural change should make the system easier to reason about, not just make the directory structure look tidier.

The five refactorings also have one thing in common: they all reduce the context that the human brain needs to remember at the same time. After the data layer is independent, readers do not have to understand data cleaning in the UI code; after incremental indicators are independent, readers do not need to recall the complete historical window every time; after virtualized rendering is independent, readers do not have to mix hundreds of thousands of K lines with hundreds of K lines on the screen; after multi-process boundaries are established, readers do not have to put UI responses and CPU-intensive calculations in the same failure domain. Good architecture is not about making code look advanced, but about enabling maintainers to make correct judgments under pressure.

If you want to migrate this method to your own system, you can start with a minimal closed loop: choose a module that has problems repeatedly, record the current symptoms, add tests or benchmarks that can reproduce the problem, and then write an ADR explaining why it needs to be changed, how to roll back, and when to review. Only move boundaries when the chain of evidence is complete. This order avoids the risk of “making big changes first and adding evidence later” and also allows the team to discuss facts rather than personal preferences during code reviews.

This is also the connection point between Part 6 and the previous articles: the defect catalog provides problem samples, the testing article provides a safety net, the performance article provides measurement methods, and the refactoring article transforms these evidence into boundary adjustments. Without previous evidence, architectural evolution will become an abstract slogan; without subsequent evolution, previous repairs will gradually pile up new debts.

Therefore, this article is more suitable as an architectural checklist for the long-term maintenance phase. It is also suitable for backtesting platforms, risk control platforms and desktop trading terminals.


Introduction: Refactoring Decisions and Technical Debt Management

In the development process of quantitative trading systems, architecture reconstruction is an inevitable stage. As the size of the code base gradually increases and the degree of component coupling becomes higher and higher, the management of technical debt becomes a key factor in the success of the project.

This part revolves around the practical experience of the micang-trader project. Readers can see how refactoring decisions grow out of defects, performance, testing and collaboration costs, and also see what evidence is needed to support a sustainable architecture evolution mechanism.


Part One: Record of five reconstructions of micang-trader

The first refactoring: decoupling the data layer and UI layer

Architectural dilemma before refactoring

Some time after the project started, obvious architectural problems have emerged.

Typical code smell:

class ChartWidget(QWidget):  # illustrative code, not production code
    """Chart component that mixes data access, processing, and rendering."""

    def load_kline_data(self, symbol: str, days: int = 30):
        """read data directly from the database"""
        conn = sqlite3.connect(self.db_path)
        cursor = conn.execute(
            """SELECT datetime, open, high, low, close, volume
               FROM bar_data
               WHERE symbol = ? AND datetime > date('now', '-{} days')
               ORDER BY datetime""".format(days),
            (symbol,)
        )
        rows = cursor.fetchall()
        conn.close()

        # direct UI data
        self.kline_data = [
            {
                'datetime': row[0],
                'open': float(row[1]),
                'high': float(row[2]),
                'low': float(row[3]),
                'close': float(row[4]),
                'volume': int(row[5])
            }
            for row in rows
        ]

        self.update()  # trigger repaint

The problem with this code:

  1. Mixed responsibilities: UI components directly operate the database
  2. Difficult to Test: Requires mock SQLite to unit test
  3. Duplicate Code: 8 components have similar SQL queries
  4. Cannot be reused: Data acquisition logic and Qt component binding

This picture first answers the boundary question: before refactoring, ChartWidget both retrieves and interprets data, and the GUI code and data semantics are tied together; after refactoring, DataService becomes the only data semantic entry, and the UI only consumes the compiled K-line and indicator inputs.

Architecture comparison before and after decoupling the data layer and UI layer
Figure 2: Comparison before and after decoupling the data layer and UI layer. ChartWidget no longer directly owns data semantics.

Pain point data:

  • Changing data sources requires modifying a large number of files
  • The same “get the last 30 days K-line” logic is repeated in multiple places
  • Test coverage is low (because it is difficult to mock the database)
  • New feature development speed is limited

Refactor decision analysis

Signal that triggers reconstruction (satisfying multiple items):

Signalstatus quothresholdWhether to trigger
code duplicationhigher> 20%
Number of lines in a single fileexceeds threshold> 1,000
Test difficultyNeed mock DBIt should be testable individually
Modify scope of influenceMore files< 5 files

Refactoring Goals:

  • Centralize data access logic into 1-2 files
  • The chart component obtains data through the interface and does not directly access the database.
  • Supports unit testing (no real database required)
  • Changing the data source only requires changing 1 file

Input-output assessment:

refactoring cost = 5  × 1  = 40
refactoring benefit = data 3  ×  5  = 15  = 120
risk cost = 10 ( Bug )

ROI > 1.5, worthy of reconstruction.

Restructured architecture

Core code comparison:

Before reconstruction (ChartWidget directly accesses the database):

class ChartWidget(QWidget):  # illustrative code, not production code
    def load_kline_data(self, symbol: str, days: int = 30):
        conn = sqlite3.connect(self.db_path)
        cursor = conn.execute(SQL_QUERY, (symbol,))
        rows = cursor.fetchall()
        # ... data processing

After refactoring (getting data through DataService):

class ChartWidget(QWidget):  # illustrative code, not production code
    def __init__(self, data_service: DataService):
        self.data_service = data_service

    def load_kline_data(self, symbol: str, days: int = 30):
        # retrieve data through the interface without depending on the underlying implementation
        self.kline_data = self.data_service.get_kline(
            symbol=symbol,
            days=days,
            interval='1m'
        )
        self.update()

# data service interface
class DataService(ABC):
    @abstractmethod
    def get_kline(self, symbol: str, days: int, interval: str) -> List[KLine]:
        pass

# implementation
class SQLiteDataService(DataService):
    def get_kline(self, symbol: str, days: int, interval: str) -> List[KLine]:
        # database access logic is centralized here
        ...

Reconstruction results

Quantitative benefits:

indexBefore refactoringAfter reconstructionpromote
Changing the data source affects the number of filesmore1significantly reduced
unit test coveragelowerhighersubstantial improvement
Data related bugs (months)morelesssignificantly reduced
New feature development speedslowerfasterSignificant improvement

Non-quantified income:

  • The team dares to change the code (increased psychological safety)
  • Onboarding time for newcomers reduced from 2 weeks to 3 days
  • Data layer issues are no longer a distraction during code review

Architect Review: Symptoms, Triggers, Verification and Residual Costs

Review fieldReaders should see the judgment
symptomChartWidget is responsible for database access, K-line conversion, drawing refresh and interactive response at the same time. Any GUI adjustment may change the data semantics.
trigger signalChanging the data source requires going through multiple UI files, data-related bugs are difficult to reproduce, and unit testing must mock the GUI, SQLite, and window life cycle.
Before refactoringThe UI is the de facto data owner, with data source replacement, trading session attribution and indicator input scattered in widgets.
After reconstructionDataService is responsible for data semantics and data source adaptation, while ChartWidget only relies on interfaces to consume structured data; tests can bypass the GUI and directly verify trading periods, gap filling and data source replacement.
Decision basisThis is not catalog organization, but moving “whether the data is trustworthy” out of the interface code. As long as the data boundaries are not independent, subsequent backtesting, live trading, and performance optimization will contaminate each other in the same UI component.
Verification resultsThe impact of data source replacement converges from multiple UI files to DataService implementation; core data conversion logic can be covered with windowless unit tests; Code Review can independently review data boundaries.
residual costAbstract interfaces increase initialization and dependency injection costs, and the team must maintain interface conventions to avoid plugging business shortcut logic back into ChartWidget.
Rollback strategyKeep the old data reading path for a period of time, confirm that the DataService output is consistent through double-read comparison, and then remove the old SQL entry.

Second refactoring: indicator calculation changed from Pandas to incremental

The emergence of performance bottlenecks

After the backtest function was launched, performance problems gradually emerged: it took 3 minutes to run one month’s worth of data, which seriously affected the efficiency of strategy verification.

Profiling results:

Total time: 180.5s

Breakdown:
- indicator calculation: 145.2s (80.4%)
  - MA5/MA10: 42.1s
  - RSI: 38.7s
  - MACD: 35.4s
  - other: 29.0s
- data loading: 25.3s (14.0%)
- signal generation: 6.8s (3.8%)
- other: 3.2s (1.8%)

Question code:

def calculate_indicators(bars: List[Bar]) -> pd.DataFrame:  # illustrative code, not production code
    """full calculation of all indicators - performance bottleneck"""
    df = pd.DataFrame(bars)

    # recompute every historical indicator on each backtest
    df['ma5'] = df['close'].rolling(window=5).mean()
    df['ma10'] = df['close'].rolling(window=10).mean()
    df['ma20'] = df['close'].rolling(window=20).mean()
    df['rsi'] = talib.RSI(df['close'], timeperiod=14)
    df['macd'], df['macd_signal'], df['macd_hist'] = talib.MACD(
        df['close'], fastperiod=12, slowperiod=26, signalperiod=9
    )

    return df

The problem is: backtesting is done bar by bar, and every time a new candlestick is received, all historical indicators must be recalculated. 10,000 candlesticks × 10 indicators = 100,000 repeated calculations.

Reconstruction plan: incremental computing architecture

Core Insight:

There are two main categories of indicator calculations:

  1. Fully dependent (such as RSI): requires historical data and cannot be fully incremental
  2. Sliding window type (such as MA): only the latest N K lines are needed, and can be incremented

This picture answers “what state is the incremental indicator maintaining?” Readers do not need to memorize all technical indicators, as long as they understand the core of IncrementalMA: window is not full, window sliding, continuous update, gap recovery and abnormal reconstruction are different states and cannot be covered up by an ordinary function call.

IncrementalMA incremental indicator state migration chart
Figure 3: Incremental indicator state machine showing how new candlesticks only update necessary states.

Core code implementation:

from abc import ABC, abstractmethod  # illustrative code, not production code
from dataclasses import dataclass
from typing import List, Optional

@dataclass
class IndicatorState:
    """indicator state base class"""
    timestamp: datetime
    value: float

class IncrementalIndicator(ABC):
    """incremental indicator interface"""

    @abstractmethod
    def update(self, bar: Bar) -> IndicatorState:
        """accept a new bar and return the latest indicator value"""
        pass

    @abstractmethod
    def reset(self):
        """Reset internal indicator state."""
        pass

class IncrementalMA(IncrementalIndicator):
    """incremental moving average - sliding-window implementation"""

    def __init__(self, period: int):
        self.period = period
        self.window: List[float] = []
        self.sum = 0.0

    def update(self, bar: Bar) -> IndicatorState:
        price = bar.close
        self.window.append(price)
        self.sum += price

        # window slides forward
        if len(self.window) > self.period:
            self.sum -= self.window.pop(0)

        # calculate only after the window is full
        if len(self.window) == self.period:
            ma = self.sum / self.period
        else:
            ma = self.sum / len(self.window)

        return IndicatorState(
            timestamp=bar.datetime,
            value=ma
        )

    def reset(self):
        self.window.clear()
        self.sum = 0.0

class IncrementalRSI(IncrementalIndicator):
    """RSI implemented with incremental gain/loss state."""

    def __init__(self, period: int = 14):
        self.period = period
        self.prev_close: Optional[float] = None
        self.gain_sum = 0.0
        self.loss_sum = 0.0
        self.gains: List[float] = []
        self.losses: List[float] = []

    def update(self, bar: Bar) -> IndicatorState:
        if self.prev_close is None:
            self.prev_close = bar.close
            return IndicatorState(bar.datetime, 50.0)  # neutral value

        change = bar.close - self.prev_close
        gain = max(change, 0)
        loss = abs(min(change, 0))

        self.gains.append(gain)
        self.losses.append(loss)

        # Update
        if len(self.gains) <= self.period:
            self.gain_sum += gain
            self.loss_sum += loss
        else:
            # Wilder's smoothing
            self.gain_sum = (self.gain_sum * (self.period - 1) + gain) / self.period
            self.loss_sum = (self.loss_sum * (self.period - 1) + loss) / self.period
            self.gains.pop(0)
            self.losses.pop(0)

        self.prev_close = bar.close

        if self.loss_sum == 0:
            rsi = 100.0
        else:
            rs = self.gain_sum / self.loss_sum
            rsi = 100 - (100 / (1 + rs))

        return IndicatorState(bar.datetime, rsi)

    def reset(self):
        self.prev_close = None
        self.gain_sum = 0.0
        self.loss_sum = 0.0
        self.gains.clear()
        self.losses.clear()

Benefit comparison of refactoring

Performance test results:

sceneBefore refactoringAfter reconstructionpromote
10,000 K lines180.5s3.2s56x
50,000 K lines892.3s14.8s60x
Memory usage1.2GB180MB6.7x

Complexity Analysis:

old complexity: O(n × m × k)
n = K
m = indicator
k = ()

new complexity: O(n × m)
  each bar is calculated once while state is maintained

Key decision points for refactoring

**Decision 1: Change all indicators to incremental? **

no. Some indicators (such as Bollinger Band width, ATR) are simpler to calculate in full and have little impact on performance. Only optimize hotspots displayed by profiling.

**Decision 2: How to verify the correctness of the reconstruction? **

def test_indicator_consistency():  # illustrative code, not production code
    """verify that incremental calculation matches full recalculation"""
    bars = load_test_data('hsi_1m_10000.csv')

    # full(Referencesimplementation)
    df = pd.DataFrame(bars)
    expected_ma = df['close'].rolling(5).mean()

    # incremental calculation
    ma = IncrementalMA(5)
    actual_ma = [ma.update(bar).value for bar in bars]

    # comparison(tolerance)
    for i, (exp, act) in enumerate(zip(expected_ma, actual_ma)):
        if not pd.isna(exp):
            assert abs(exp - act) < 1e-10, f"Bar {i}: expected {exp}, got {act}"

**Decision 3: When to give up the increment and roll back to full volume? **

If there are gaps in the data (such as missing bars), the incremental state may become invalid. Strategy:

  • Check data continuity
  • Trigger full recalculation when discontinuous
  • Record the number of recalculations. If recalculations occur frequently, it indicates data quality issues.

Architect Review: Symptoms, Triggers, Verification and Residual Costs

Review fieldReaders should see the judgment
symptomProfiling shows that indicator calculation takes up most of the time. Every time a K-line is advanced in the backtest, the historical window is scanned repeatedly. Performance problems amplify linearly with the sample size.
trigger signalA month’s backtesting takes minutes, and strategy parameter adjustments cannot be fed back quickly; although Pandas’ full recalculation is simple, it turns adding a new K line into recalculating the entire history.
Before refactoringThe indicator function converts the bars into a DataFrame and then rolls them in full. The status is hidden in the DataFrame calculation. There is no common model for playback recovery and real incremental update.
After reconstructionIncrementalMA explicitly maintains the window, sum and current state; the new bar only updates the sliding window; gaps, playback recovery and reset become testable.
Decision basisOnly optimize profiling to tell readers the hottest paths, and do not forcibly change all indicators to increments. Sliding window indicators are prioritized for incrementalization, and indicators with strong full dependence and unobvious benefits are retained as reference implementations.
Verification resultsUse the same historical bars to run the Pandas reference implementation and incremental implementation at the same time, and compare the consistency of the results bar by bar; use benchmark to prove that the time consumption of 10,000 K-lines has been reduced from minutes to seconds.
residual costThe incremental state increases the implementation complexity. Initialization, breakpoint continuation, missing bar, and playback recovery must all be independently tested. Otherwise, performance optimization will hide correctness risks.
Rollback strategyKeep the full implementation of Pandas as the reference path; when the continuity check fails or the incremental status is abnormal, fall back to full recalculation and record the number of triggers.

The third refactoring: splitting of chart components

”God-like” disaster

When the project developed to a certain stage, chart_widget.py became a “God class” - data acquisition, chart rendering, and user interaction were all mixed together, making it difficult to maintain and test.

This picture answers the question “Who is responsible for what after the split?” If you only cut large files into small files, but the event subscription, refresh sequence and status ownership are still unclear, readers will only see directory changes, not architectural improvements.

Chart Component MVC Container Boundary Chart
Figure 4: Diagram MVC container boundary, splitting the data model, renderer, and interaction controller.

Mixed Responsibilities:

  • Data acquisition (database query, cache management)
  • Chart rendering (K-line, indicator, order, grid)
  • User interaction (mouse, keyboard, scroll wheel)
  • Business logic (price formatting, alignment calculation)

Team Impact:

  • Long modification response time (needs to understand a lot of code)
  • Bug introduction rate is high (changes can easily destroy other functions)
  • Low test coverage (too many logical couplings, difficult to test)

Refactoring plan: MVC layered architecture

File splitting:

original fileAfter splitResponsibilities
chart_widget.py (large file)chart_widget.pyComponent coordination
data_manager.pyData management
indicator_manager.pyIndicator calculation
chart_renderer.pyChart rendering
overlay_renderer.pyOrder rendering
interaction_controller.pyuser interaction
navigator.pyView navigation
totalMultiple filesclearer

Core code refactoring example:

Before refactoring (ChartWidget does everything):

class ChartWidget(QWidget):  # illustrative code, not production code
    def __init__(self):
        self.data = []
        self.cached_indicators = {}
        self.zoom_level = 1.0
        self.pan_offset = 0

    def mousePressEvent(self, event):
        # handle
        if event.button() == Qt.LeftButton:
            price = self.y_to_price(event.y())
            self.selected_price = price
            self.update()

    def paintEvent(self, event):
        # K
        painter = QPainter(self)
        for i, bar in enumerate(self.data):
            x = self.index_to_x(i)
            self.draw_candle(painter, x, bar)

        # indicator
        for name, values in self.cached_indicators.items():
            self.draw_line_indicator(painter, values)

    def load_data(self, symbol):
        # data
        conn = sqlite3.connect('data.db')
        cursor = conn.execute("SELECT * FROM bars WHERE symbol = ?", (symbol,))
        self.data = cursor.fetchall()
        self.calculate_indicators()

After refactoring (separation of duties):

# chart_widget.py - coordinate  # illustrative code, not production code
class ChartWidget(QWidget):
    def __init__(self):
        self.data_manager = DataManager()
        self.indicator_manager = IndicatorManager()
        self.renderer = ChartRenderer()
        self.controller = InteractionController(self)

    def set_symbol(self, symbol: str):
        data = self.data_manager.load(symbol)
        indicators = self.indicator_manager.calculate(data)
        self.renderer.set_data(data, indicators)
        self.update()

# data_manager.py - data
class DataManager:
    def __init__(self):
        self.cache = DataCache()
        self.source = SQLiteDataSource()

    def load(self, symbol: str, timeframe: str = '1m') -> List[Bar]:
        if self.cache.has(symbol, timeframe):
            return self.cache.get(symbol, timeframe)
        data = self.source.fetch(symbol, timeframe)
        self.cache.set(symbol, timeframe, data)
        return data

# chart_renderer.py - rendering
class ChartRenderer:
    def __init__(self):
        self.candle_renderer = CandleRenderer()
        self.indicator_renderer = IndicatorRenderer()

    def render(self, painter: QPainter, rect: QRect):
        self.candle_renderer.render(painter, self.data)
        self.indicator_renderer.render(painter, self.indicators)

# interaction_controller.py -
class InteractionController:
    def __init__(self, widget: ChartWidget):
        self.widget = widget
        self.navigator = ViewNavigator()

    def handle_mouse_press(self, pos: QPoint):
        if self.widget.hit_test(pos):
            self.navigator.start_drag(pos)

Reconstruction results

Quantitative benefits:

indexBefore refactoringAfter reconstructionpromote
Total number of lines in the filemorestreamlineCode is clearer
average function lengthlongershorterImproved readability
unit test coveragelowerhighersubstantial improvement
Modify response timelongershorterEfficiency improvement
Bug introduction ratehigherlowerquality improvement

Non-quantified income:

  • Team psychological safety: from “dare not to change” to “dare to change”
  • Code review time: reduced from 1 hour to 15 minutes
  • Newbie understanding time: reduced from 3 days to 2 hours

Architect Review: Symptoms, Triggers, Verification and Residual Costs

Review fieldReaders should see the judgment
symptomchart_widget.py becomes a God Object, with Model, Renderer, Controller and business formatting logic mixed together. Local changes require understanding of drawing, data and interaction at the same time.
trigger signalCode review time becomes longer, newcomers cannot quickly locate bugs, event subscription and refresh order are often changed, and test coverage is hindered by god class.
Before refactoringChartWidget manages data, cached indicators, zoom, pan, mouse event and paint event directly, without any boundary being an explicit interface.
After reconstructionDataManager manages data, IndicatorManager manages indicators, ChartRenderer manages drawing, InteractionController manages user actions, event subscription and refresh sequence are explicitly connected through the coordination layer.
Decision basisThis split is not to increase the number of files, but to allow each type of change to have a separate bearing point: data changes do not affect mouse interaction, interaction changes do not affect indicator calculations, and rendering optimization does not change business semantics.
Verification resultsModel and Renderer can be unit tested separately; interactive controllers can be tested using event sequences; Code Review can be split by boundaries to reduce the context that needs to be loaded for a review.
residual costWhen the number of interfaces increases, the order of events, cache invalidation, and refresh throttling must form a protocol, otherwise the system will change from “one large class that is difficult to understand” to “a group of small classes that imply each other.”
Rollback strategyKeep the external API of the old ChartWidget and let the new module take over the responsibilities internally first; if the interaction regression fails, you can switch back to the old rendering path and retain the data model split.

The fourth reconstruction: chart virtualization rendering optimization

Performance bottleneck: Stuttering in charts with large amounts of data

After the MVC split of the chart component, the code structure is clearer, but new performance bottlenecks are encountered when processing large amounts of K-line data:

Problem scenario:

  • When loading 10,000 K lines, the initial rendering takes 800ms, and the user perceives obvious lag.
  • When dragging to view historical data, each redraw requires re-rendering the K-lines in all visible areas.
  • During scaling operations, full redraw causes the frame rate to drop below 15 FPS
  • The memory usage increases linearly with the amount of data, and can reach 500MB+ after long-term operation.

Performance analysis data:

chart performance profiling(10,000 bars K ):
- initial render time: 780ms
- drag latency: 120ms/
- zoom repaint time: 350ms
- memory usage: 485MB
- GPU texture upload: 120ms(bottleneck)

root cause:

  1. Full Rendering: No matter how many K lines are displayed in the viewport, all data will be calculated and rendered.
  2. No caching: Candlestick geometry is recalculated every frame
  3. GPU texture overflow: A large number of K-lines lead to frequent allocation/release of texture memory

Reconstruction plan: sliding window + virtualized rendering

Core Strategy:

  1. Sliding Window: Only maintain the visible area + buffer (for example, the viewport displays 200 roots, and actually loads 400 roots)
  2. Virtual Rendering: Only calculate and render the K-line within the viewport
  3. Offscreen Cache: Pre-rendering a fixed area, panning instead of redrawing when dragging

This picture answers the question “Why virtual rendering is not just about drawing a few fewer K lines.” The real path is that the user drags/zooms to change the viewport, the viewport determines the buffer, the buffer determines whether the offscreen cache can be reused, and texture reuse determines whether to avoid frequent allocation of GPU textures.

Quantitative trading chart virtualization rendering data path
Figure 5: Virtualized rendering path, viewport drives visible data, buffers, off-screen caching and texture reuse.

Core code implementation:

1. Sliding Window Manager

# illustrative code, not production code

@dataclass
class WindowConfig:
    """configuration"""
    viewport_size: int = 200      # visible area K
    buffer_ratio: float = 0.5     # buffer zone( 50%)
    min_buffer: int = 50          # buffer zone

class SlidingWindowManager:
    """sliding window manager - decides which data should be loaded into memory"""

    def __init__(self, config: WindowConfig = None):
        self.config = config or WindowConfig()
        self._full_data: List[Bar] = []
        self._window_start = 0
        self._window_end = 0

    def set_data(self, data: List[Bar]):
        """set the full data source"""
        self._full_data = data
        self._recalculate_window(0)  # data

    def move_to_index(self, center_index: int):
        """location"""
        self._recalculate_window(center_index)

    def get_window_data(self) -> Tuple[List[Bar], int, int]:
        """get the current window data and its offset in the full dataset"""
        return (
            self._full_data[self._window_start:self._window_end],
            self._window_start,
            self._window_end
        )

    def _recalculate_window(self, center_index: int):
        """range"""
        total = len(self._full_data)
        buffer_size = max(
            int(self.config.viewport_size * self.config.buffer_ratio),
            self.config.min_buffer
        )

        # range
        half_viewport = self.config.viewport_size // 2
        self._window_start = max(0, center_index - half_viewport - buffer_size)
        self._window_end = min(total, center_index + half_viewport + buffer_size)

    def should_reload(self, new_center: int) -> bool:
        """needdata"""
        buffer_threshold = self.config.min_buffer // 2
        current_center = (self._window_start + self._window_end) // 2

        # buffer zone
        return abs(new_center - current_center) > buffer_threshold

2. Virtualized renderer

# illustrative code, not production code

class VirtualizedChartRenderer:
    """virtualized chart renderer - render only the visible area"""

    def __init__(self, window_manager: SlidingWindowManager):
        self.window_manager = window_manager
        self._offscreen_cache = OffscreenCache()
        self._geometry_cache: Dict[int, CandleGeometry] = {}

    def render(self, painter: QPainter, rect: QRect, offset_x: float):
        """
        render the chart
:param offset_x: ()
        """
        # currentdata
        window_data, data_start_idx, _ = self.window_manager.get_window_data()

        # visible area range
        visible_indices = self._calculate_visible_indices(
            offset_x, len(window_data), rect.width()
        )

        # checkuseoffscreen cache
        if self._offscreen_cache.is_valid(visible_indices, offset_x):
            # directcache
            self._offscreen_cache.draw(painter, rect, offset_x)
            return

        # rerender the visible area when the cache is invalid
        self._render_visible_area(
            painter, rect, window_data, visible_indices, data_start_idx
        )

        # Updatecache
        self._offscreen_cache.update(
            painter.device(), visible_indices, offset_x
        )

    def _calculate_visible_indices(self, offset_x: float,
                                   data_count: int, viewport_width: int) -> slice:
        """calculate the index range for the current visible area"""
        candle_width = 8  # bars K  8
        spacing = 2       # 2
        total_width = candle_width + spacing

        # account for the shifted start index
        start_idx = max(0, int(-offset_x / total_width))
        visible_count = int(viewport_width / total_width) + 2  # +2
        end_idx = min(data_count, start_idx + visible_count)

        return slice(start_idx, end_idx)

    def _render_visible_area(self, painter: QPainter, rect: QRect,
                            data: List[Bar], visible: slice, data_offset: int):
        """render only the visible area"""
        for i in range(visible.start, visible.stop):
            if i >= len(data):
                break

            bar = data[i]
            geometry = self._get_or_create_geometry(
                i + data_offset, bar, rect.height()
            )
            self._draw_candle(painter, geometry, i - visible.start)

    def _get_or_create_geometry(self, global_idx: int, bar: Bar,
                                height: int) -> CandleGeometry:
        """create K (cache)"""
        if global_idx not in self._geometry_cache:
            self._geometry_cache[global_idx] = self._calculate_geometry(bar, height)
        return self._geometry_cache[global_idx]

3. Off-screen caching system

# illustrative code, not production code

class OffscreenCache:
    """offscreen cache - pre-render and store as a texture, then translate directly during dragging"""

    def __init__(self, cache_size: int = 2048):
        self.cache_size = cache_size
        self._pixmap: Optional[QPixmap] = None
        self._valid_range: Optional[slice] = None
        self._cached_offset: float = 0.0

    def update(self, source: QPaintDevice, visible_range: slice, offset: float):
        """update cached content"""
        if self._pixmap is None or self._pixmap.size().width() != self.cache_size:
            self._pixmap = QPixmap(self.cache_size, source.height())

        # pre-render a range larger than the visible area
        painter = QPainter(self._pixmap)
        # ... rendering logic...
        painter.end()

        self._valid_range = visible_range
        self._cached_offset = offset

    def is_valid(self, current_range: slice, current_offset: float) -> bool:
        """checkcache"""
        if self._pixmap is None or self._valid_range is None:
            return False

        # cacherange
        offset_diff = abs(current_offset - self._cached_offset)
        return offset_diff < 50  # reuse cache only for small viewport shifts

    def draw(self, painter: QPainter, rect: QRect, offset: float):
        """cache(support)"""
        if self._pixmap is None:
            return

        # source area to copy from the cached pixmap
        source_x = int(offset - self._cached_offset)
        source_rect = QRect(source_x, 0, rect.width(), rect.height())

        # draw cached content
        painter.drawPixmap(rect, self._pixmap, source_rect)

4. GPU texture management

# illustrative code, not production code

class GPUTextureManager:
    """GPU texture manager for allocation and release."""

    def __init__(self, max_textures: int = 10):
        self.max_textures = max_textures
        self._texture_pool: List[QOpenGLTexture] = []
        self._active_textures: Dict[str, QOpenGLTexture] = {}
        self._lru_order: List[str] = []

    def acquire_texture(self, key: str, width: int, height: int) -> QOpenGLTexture:
        """texture(reuse)"""
        if key in self._active_textures:
            # LRU (recentuse)
            self._lru_order.remove(key)
            self._lru_order.append(key)
            return self._active_textures[key]

        # needcreatetexture
        if len(self._texture_pool) > 0:
            texture = self._texture_pool.pop()
            texture.setSize(width, height)
            texture.allocateStorage()
        else:
            texture = QOpenGLTexture(QOpenGLTexture.Target2D)
            texture.setSize(width, height)
            texture.setFormat(QOpenGLTexture.RGBA8_UNorm)
            texture.allocateStorage()

        self._active_textures[key] = texture
        self._lru_order.append(key)

        # if, use
        if len(self._active_textures) > self.max_textures:
            lru_key = self._lru_order.pop(0)
            old_texture = self._active_textures.pop(lru_key)
            self._texture_pool.append(old_texture)

        return texture

Reconstruction results

Performance improvements:

indexBefore refactoringAfter reconstructionpromote
Initial rendering time780ms45ms17x
Drag response delay120ms/frame8ms/frame15x
Zoom redraw time350ms25ms14x
Memory usage485MB85MB5.7x
Frame rate (drag and drop)15 FPS60 FPS4x

Optimization strategy comparison:

Optimization pointsImplementation methodEffect
sliding windowOnly load viewport + bufferMemory reduced by 5.7x
Virtualized renderingOnly render visible areaRender time reduced by 17x
Off-screen cachingPre-render a large range and pan when draggingSmooth dragging 60 FPS
Geometry cacheCache K-line shape calculation60% reduction in CPU usage
GPU texture poolReuse texture objectsReduce GC pauses

Core Cognition:

The essence of chart performance optimization is to reduce invalid calculations:

  1. Data level: Use sliding windows to control the amount of data in memory
  2. Rendering Level: Use virtualization to draw only the visible area
  3. Interactive level: Use caching to avoid double calculations
  4. Hardware level: Use texture pools to reduce GPU memory allocation

This solution allows the chart to smoothly process 100,000+ K lines, providing a foundation for high-frequency real-time data display.

Architect Review: Symptoms, Triggers, Verification and Residual Costs

Review fieldReaders should see the judgment
symptomThe structure is clearer after MVC splitting, but initial rendering, drag, zoom, and real-time updates still trigger too much drawing, and users see lags instead of clear boundaries.
trigger signalDragging and dropping frames under 10,000 K lines, scaling and redrawing have reached hundreds of milliseconds, and GPU texture uploading and geometry calculation have become new bottlenecks.
Before refactoringThe renderer mixes the full data pool with the visible area of ​​the screen, and reprocesses a large number of invisible candlesticks when the viewport changes.
After reconstructionVirtualizedCandleRenderer takes the viewport as the entry point and only loads the viewport + buffer; the offscreen cache supports translation multiplexing, and texture reuse reduces GPU allocation costs.
Decision basisThe performance bottleneck has been transferred from the data layer to the view layer, and continuing to optimize indicator calculations cannot improve the drag and drop experience; the rendering path must be made to surround the visible window, rather than the complete historical data.
Verification resultsInitial rendering, drag response, zoom redraw, memory usage and frame rate enter the benchmark together; whether the optimization is successful is based on the user-perceivable interaction delay.
residual costVirtualization introduces cache invalidation, edge K-line truncation, coordinate mapping and real-time refresh sequence issues; tests must cover window boundaries, fast scaling and market appending.
Rollback strategyReserve the non-virtualized renderer as a low-data fallback; when the viewport calculation is abnormal or the cache fails frequently, temporarily switch back to direct rendering and record the trigger conditions.

The fifth reconstruction: separation of multi-process architecture

Performance bottlenecks of Python GIL

When the project developed to a certain stage, we encountered an inherent bottleneck of Python: GIL (Global Interpreter Lock).

Problem scenario:

  • Indicator calculation takes 100% CPU, but only 1 core can be used
  • The UI freezes during data recording because SQLite writing blocks the main thread.
  • ATR calculation and daily period calculation slow down real-time market processing

Performance analysis data:

8-core CPU, but the Python process uses only 12.5%, one core saturated
indicator calculation: 150ms(User)
data Tick: seconds 50  Tick  3-5

Multi-process architecture design

**Core decision: offload CPU-intensive tasks to independent child processes. **

This picture answers the question “Why multi-threading is not enough”. Before the reconstruction, although the UI, indicators, recording and backtesting were divided into multiple threads, they still competed for the GIL, event loop and log context in the same Python process; after the reconstruction, the UI, calculation, recording and backtesting were split into different processes, and IPC + shared_memory was only responsible for necessary data exchange and control signals.

Comparison chart of failure domains between multi-threaded single-process and multi-process quantitative trading systems
Figure 6: Comparison of multi-threaded single-process and multi-process failure domains, showing why thread splitting is not a substitute for process boundaries.

Detailed explanation of child process responsibilities

child processResponsibilitiesCommunication methodTrigger condition
ComputeClientATR calculation, daily cycle minute calculationPipe + shared memoryReal-time market driven
IndicatorWorkerPoolParallel calculation of indicators (N Workers)Queue + shared memoryWhen data is updated
OfflineWorkerPoolOffline tasks (full calculation, pre-calculation)Queue + LMDBUser trigger/timing
DataRecorderTick/K line recording to databasePipe + shared memorySubscription quotes start automatically
BacktestIndependent backtesting engineQueue + shared memoryBacktest request
TradingReal-time trading moduleQueueTrading instructions

Inter-process communication mechanism

1. Shared Memory

High-frequency data exchange achieves zero copy through shared memory:

# ComputeClient starts the Worker Process  # illustrative code, not production code
class ComputeClient:
    def start_worker(self, gds_1m_shm_name: Optional[str] = None):
        # createprocess Pipe (REQ-NF-13)
        parent_reader, parent_writer = Pipe(duplex=False)
        self._parent_pipe_writer = parent_writer

        # create IPC  Pipe
        ipc_child_conn, ipc_parent_conn = Pipe(duplex=True)
        self._ipc_parent_conn = ipc_parent_conn

        # process(support Qt UI)
        self._worker_process = Process(
            target=run_compute_worker_with_qt,
            args=(prefix, self._worker_stop_event),
            kwargs={
                "gds_1m_shm_name": self._gds_1m_shm_name,
                "ipc_conn": ipc_child_conn,
                "parent_pipe_reader": parent_reader,
            },
            daemon=True,  # process
        )
        self._worker_process.start()

2. ATR calculation process

After the ATR calculation is moved out of the main process, the key is not to be “faster”, but that the real-time market will no longer be blocked by CPU-intensive calculations. The main process is only responsible for submitting tasks, reading results and handling exceptions; the sub-process is responsible for calculations; shared_memory is responsible for high-frequency data exchange; the log link is responsible for stringing together requests, processes and results.

3. Data recording sub-process

Data recording runs independently and receives Ticks through the shared memory RingBuffer:

# Recorder subprocess   # illustrative code, not production code
def run_recorder_subprocess(
    event_queue_main_to_sub: Queue,
    event_queue_sub_to_main: Queue,
    tick_shm_symbols: List[str],
):
    # 1. create QApplication
    app = QApplication([])

    # 2. create RecorderMainFacade
    facade = RecorderMainFacade(sub_to_main_queue=event_queue_sub_to_main)

    # 3.  TickShmReader thread
    tick_thread = Thread(
        target=_run_tick_shm_thread,
        args=(recorder_engine, tick_shm_symbols),
        daemon=True
    )
    tick_thread.start()

    # 4. thread
    forward_thread = Thread(
        target=_run_event_forward_thread,
        args=(event_queue_main_to_sub, event_engine, facade),
        daemon=True
    )
    forward_thread.start()

    # 5.
    _run_command_loop(app, recorder_engine, facade)

Refactoring benefits

Performance improvements:

indexBefore refactoringAfter reconstructionpromote
CPU utilization12.5% ​​(single core)85% (multi-core)6.8x
Indicator calculation delay150ms25ms6x
UI frame rate15 FPS60 FPS4x
Tick ​​loss rate6-10%< 1%10x
Impact of offline tasksBlock UIRunning in the backgroundNo perception

Architectural advantages:

  1. GIL Bypass: CPU-intensive tasks run in sub-processes, taking full advantage of multi-cores
  2. UI response: The main process focuses on the UI and is not affected by background tasks
  3. Fault isolation: The crash of the child process will not cause the main program to exit
  4. Independent expansion: Each sub-process can independently expand or contract according to the load

Key points of technical implementation

1. Parent process exits listening (REQ-NF-13)

def _run_parent_exit_watchdog(parent_pipe_reader: Connection, stop_event: Event):  # illustrative code, not production code
    """process, process"""
    try:
        # block until the parent process closes the pipe
        parent_pipe_reader.recv()
    except EOFError:
        # process
        stop_event.set()

2. Shared memory data format

class ComputeShmBackend:  # illustrative code, not production code
    """memory, process and  Worker data"""

    def create_slots(self):
        # daily-period calculation input/output
        self._daily_in = shared_memory.SharedMemory(
            name=f"{self.name_prefix}_daily_in", create=True, size=64
        )
        self._daily_out = shared_memory.SharedMemory(
            name=f"{self.name_prefix}_daily_out", create=True, size=8
        )

        # ATR input/output
        self._atr_in = shared_memory.SharedMemory(
            name=f"{self.name_prefix}_atr_in", create=True, size=1024
        )
        self._atr_out = shared_memory.SharedMemory(
            name=f"{self.name_prefix}_atr_out", create=True, size=16
        )

3. Worker process pool management

class IndicatorWorkerPool:  # illustrative code, not production code
    def __init__(self, num_workers: int = 4):
        self._workers: List[IndicatorWorker] = []
        self._supervisor: Optional[WorkerSupervisor] = None

        for i in range(num_workers):
            worker = IndicatorWorker(
                worker_id=f"worker_{i}",
                shared_memory_store=shm_store,
            )
            self._workers.append(worker)

        # start health monitoring
        self._supervisor = WorkerSupervisor(
            workers=self._workers,
            on_worker_restart=self._restart_worker,
        )
        self._supervisor.start()

Summary of architecture evolution

First refactor: decouple data layer and UI layer

Second refactor: incremental indicator updates

Third refactor: split chart components with MVC

Fourth refactor: virtualized chart rendering optimization

Fifth refactor: separate multi-process architecture

Future plan:
    - backtest subprocess, isolating backtest from live trading
    - live-trading subprocess, independent trading logic
    - possible microservice split, cross-machine deployment

Core Cognition:

Python’s GIL is not a shackle, but a reminder to “use multiple processes when you should use multiple processes”**. The multi-process architecture of micang-trader is not an over-design, but an inevitable choice to solve actual performance problems.

Architect Review: Symptoms, Triggers, Verification and Residual Costs

Review fieldReaders should see the judgment
symptomUI, indicator calculation, data recording and backtesting compete for the same Python process. GIL prevents CPU-intensive tasks from fully utilizing multiple cores. What users see is interface lag and tick loss.
trigger signalThe indicator calculation delay has reached a perceptible level, the recording thread blocks the main interface, offline tasks affect real disk monitoring, and single-thread optimization can no longer reduce the delay.
Before refactoringAll tasks share the main process event loop, and calculation, I/O, drawing, and user operations amplify each other’s faults; it is also difficult to distinguish which type of task is causing the delay in the log.
After reconstructionComputing, recording, backtesting and UI are separated into different failure domains; shared_memory is responsible for high-frequency data exchange, IPC is responsible for command and control, and logs are associated with pid, worker_id and trace_id.
Decision basisMulti-processing is not to pursue a sense of distribution, but because the main process can no longer satisfy UI response, real-time market conditions and CPU-intensive calculations at the same time. Whenever live monitoring is slowed down by backtesting, execution domains must be isolated.
Verification resultsCPU utilization, indicator latency, UI frame rate, tick loss rate and child process abnormal recovery are verified together; fault injection must prove that the exit of the child process will not bring down the main interface.
residual costProcess serialization, shared_memory life cycle, exception propagation, log correlation, and resource release all become new governance objects; the complexity shifts from function calls to process collaboration.
Rollback strategyThe child process pool is enabled one by one; offline calculations are first moved out of the main process, and then real-time indicators are migrated; any worker exception allows the main process to downgrade to synchronous calculation or suspend the corresponding function.

Part 2: Reconstructing the decision-making framework

When to refactor

Based on the experience of multiple reconstructions, reconstruction signals can be divided into three categories: they must be processed immediately, they should be scheduled, and they can be recorded as debts. This classification should not only look at whether the code is ugly, but also whether it affects transaction correctness, real-time response, backtest consistency and team maintenance risks.

Signal that must be reconstructed (immediate execution)

Signal 1: Modify fear index > 7

Measurement method:

change fear index = (failed tests after change + unexpectedly affected features) / changed lines × 100
  • 7: Refactor immediately

  • 3-7: Scheduling reconstruction
  • < 3: acceptable

Signal 2: Risk of single point of failure

Checklist:

  • Only 1 person can change a module
  • When this person is on vacation, no one dares to fix the bugs in this module.
  • The departure of key personnel will cause the project to stall

Signal 3: Tests fail to cover core logic

Reasons usually include:

  • Code coupling is too high and cannot be tested independently
  • Depends on external services (database, network)
  • Too many side effects and unpredictable status

Signals that should be reconstructed (scheduled execution)

SignalMeasurement methodthresholdaction
code duplicationjscpd or similar tool> 20%Extract common code
Number of file lineswc -l> 1,000split module
Cyclomatic complexityradon> 10Simplify the logic
function lengthaverage number of rows> 50Extract function
Performance bottleneckprofiling> 50% of the timeAlgorithm optimization

Signal that it’s okay to put it on hold (recording debt)

  • The code is ugly but works, and the frequency of changes is < 1 time/quarter
  • No tests but not core logic (like one-off scripts)
  • The performance is available and the optimization benefit is < 10%

5 questions you must answer before refactoring

This picture answers “how a refactoring request enters formal execution.” If there are no reproductions, no tests, no rollbacks, and no team capacity, you shouldn’t refactor the code just because it doesn’t look pleasing to your eyes.

The quantitative trading system is restructured into a decision tree
Figure 7: Refactoring into the decision tree turns “should surgery be performed” into an evidence-driven judgment.

Question 1: What is the goal of refactoring?

Vague goals (precursors of failure):

  • “Make code better”
  • “Improve code quality”
  • “Reduce technical debt”

Clear Goals (Measurable):

  • “Split chart_widget.py from line 3847 into 4 files, < 800 lines each”
  • “Reduce indicator calculation time from 180s to less than 5s”
  • “Increase unit test coverage from 31% to 80%”

SMART Principles of Goal Setting:

in principleExample
SpecificNot “optimize performance”, but “reduce backtest time to 5s”
MeasurableHave clear numerical indicators
AchievableCan be completed based on existing resources
RelevantRelevant to business goals
Time-boundSpecify completion time

Question 2: How to verify that the reconstruction is successful?

Functional Verification:

# record all test case results before refactoring  # illustrative code, not production code
pre_refactor_results = run_all_tests()

# comparison after refactoring
post_refactor_results = run_all_tests()

assert post_refactor_results == pre_refactor_results

Performance Verification:

# establish the performance baseline  # illustrative code, not production code
benchmark = {
    'backtest_10k_bars': 180.5,  # seconds
    'memory_peak': 1200,  # MB
    'cpu_usage': 85  # %
}

# comparison after refactoring
assert new_performance['backtest_10k_bars'] < 5
assert new_performance['memory_peak'] < 200

Code Metrics Verification:

  • Reduced repeatability
  • Increased coverage
  • complexity reduction
  • Reasonable file length

Question 3: What are the rollback options?

Three-layer rollback strategy:

Level 1: Git (development stage)
-  commit
- test failure git revert

Level 2: Feature Branch()
- refactor
- fulltest
-

Level 3: ()
- code
- configuration
- error

Question 4: What is the input-output ratio?

Reconstruct ROI calculation formula:

refactoring benefit = saved maintenance time × expected change count
        = (old maintenance time - new maintenance time) × change count in the next N months

refactoring cost = development time + testing time + risk cost
        = refactoring days + testing days + (bug-fix time × bug probability)

ROI = refactoring benefit / refactoring cost

Rule of Thumb:

  • ROI > 3: Highly recommended
  • ROI 1.5-3: Worth considering
  • ROI < 1.5: On hold

ROI calculation example for micang-trader chart component reconstruction:

Benefit:
- modify, /
- multiple expected changes per month and per year
- annual saving: significant

Cost:
- development time: several days
- testing time: several days
- risk cost( Bug ): some time
- total cost: controlled

ROI =  /  > 3(recommended)

Question 5: Is the team ready?

Technical preparation:

  • Has complete test coverage
  • There are code specifications (lint, format)
  • Has CI/CD process

Process preparation:

  • There is a Code Review mechanism
  • There is code ownership division
  • Have document maintenance habits

Mental preparation:

  • The team understands the value of refactoring
  • Product teams accept short-term extensions
  • Have management support

Part Three: Refactoring Strategies and Techniques

Strategy 1: Strangler Fig Pattern

Core idea: Rather than rewriting the entire module at once, replace it incrementally, with the old and new code running in parallel.

Implementation steps:

This picture answers the question “Why small-step migration is more suitable for trading systems than a one-time rewrite”. For high-risk boundaries such as data services, chart renderers, and indicator interfaces, the interfaces should be extracted first, then the new and old ones can be parallelized, and finally, regression testing and rollback switches should be used to gradually close them.

Quantitative trading system Strangler Fig small step migration path diagram
Figure 8: Strangler Fig Small-step migration path, showing interface extraction, parallelization of old and new, gradual migration and removal of old links.

Practical Case: Data Service Reconstruction

Phase 1: Extracting the interface

from abc import ABC, abstractmethod  # illustrative code, not production code

class DataService(ABC):
    """data service interface"""

    @abstractmethod
    def get_kline(self, symbol: str, days: int) -> List[Bar]:
        pass

Stage 2: Coexistence of old and new

class ChartWidget:  # illustrative code, not production code
    def __init__(self, use_new_service: bool = False):
        if use_new_service:
            self.data_service: DataService = NewDataService()
        else:
            self.data_service = LegacyDataAccess()  # adapter pattern

Phase 3: Gradual migration

# # illustrative code, not production code
widget1 = ChartWidget(use_new_service=True)   # use
widget2 = ChartWidget(use_new_service=False)  # keep the old component temporarily

Phase 4: Remove old code

# after confirming the migration is complete  # illustrative code, not production code
class ChartWidget:
    def __init__(self):
        self.data_service = NewDataService()  # direct instantiation

Strategy 2: Test first

Test preparation before refactoring:

# 1. ensure all current tests pass  # illustrative code, not production code
$ pytest --tb=short
# 127 passed, 0 failed

# 2. establish the performance baseline
$ python benchmark.py --save-baseline
# Baseline saved to .benchmark/baseline.json

# 3. add missing tests, especially around the refactored area
$ python -m coverage run -m pytest
$ python -m coverage report --show-missing
# add tests for modules below 80% coverage

Test Guard in Refactoring:

# use watchdog to run tests automatically
$ pip install pytest-watch
$ ptw --onpass "notify 'Tests passed'" --onfail "notify 'Tests failed'"

Test verification after refactoring:

# behavior compatibility verification  # illustrative code, not production code
def test_functional_parity():
    old_result = run_with_old_code(input_data)
    new_result = run_with_new_code(input_data)
    assert old_result == new_result

# performance verification
def test_performance_regression():
    new_time = benchmark_new_code()
    baseline = load_baseline()
    assert new_time < baseline * 1.1  # allow a 10% tolerance

Strategy three: AI-assisted reconstruction

Effective usage scenarios of AI:

Scenario 1: Code smell analysis

Prompt:

Analyze the following code and identify 3-5 refactoring issues:
1. Point out specific issues
2. Explain why each issue matters
3. Provide refactoring suggestions

Code:
[paste code]

Scenario 2: Refactoring plan generation

Prompt:

Design a refactoring plan based on these requirements:
- Goal: split a 3000-line ChartWidget
- constraint: preserve behavior and do not break existing features
- Requirements: provide a staged implementation plan where each stage can be rolled back independently

Scenario 3: Test Generation

Prompt:

Generate unit tests for the following function:
- cover normal cases
- cover boundary cases
- cover error cases
- use pytest

Function:
[paste function]

Notes on using AI:

✅ Can be used❌ Need to be vigilant
Identify code smellsblindly accept all suggestions
Generate refactoring templateOver-engineering (e.g. unnecessary factory pattern)
Generate test casesNot verifying the correctness of the test
Explain complex codeLet AI make architectural decisions

Part 4: Technical Debt Management

Debt Classification and Assessment

Technical Debt Matrix:

This picture answers the question “Which debts should be paid off first?” Debt in a quantitative trading system is not an ordinary TODO list. It must consider both the impact and the cost of repair: debts that affect real-time security, data correctness, and backtest consistency have higher priority than pure code style issues.

Quantitative trading system technical debt priority heat map
Figure 9: Heat map of technical debt priorities, determining repayment order by impact and remediation cost.

Debt Priority Assessment Form:

debt itemScope of influenceModify frequencyRepair costpriority
ChartWidget takes too much responsibilityAll charting features4 times a week14 daysP0 (immediately)
Indicator calculation performance bottleneckBacktest function20 times a day10 daysP0 (immediately)
Data layer couplingData related functions2 times a week5 daysP1 (this month)
Utility function missing type annotationDevelopment experience1 time per month5 daysP2 (quarterly)
No tests for old scriptsStabilized1 time per quarter3 daysP3 (record)

Debt visualization and tracking

technical_debt.md template:

# Technical debt list

## P0 - handle immediately, blocks development

### DEBT-001: ChartWidget too many responsibilities
- **location**: ui/chart_widget.py
- **symptom**:,
- **impact**:
- **suggested solution**:  DataManager/ChartRenderer/ChartController
- **estimated cost**: 14
- **expected benefit**: improvement
- **created date**: 2025-10-15
- **status**: 🟡 in progress
- **owner**: milome

### DEBT-002: indicator calculation performance bottleneck
- **location**: core/indicators.py
- **symptom**:  10000 bars K need
- **impact**: low strategy validation efficiency
- **suggested solution**: implementation IncrementalIndicator interface
- **estimated cost**: 10
- **expected benefit**: performanceimprovement
- **created date**: 2025-11-20
- **status**: 🔴 to be scheduled
- **owner**: unassigned

## P1 - handle this month, affects efficiency

### DEBT-003: data layer and UI coupling
- **location**: ui/*.py
- **symptom**: dataneedmodify
- **impact**: datamigration
- **suggested solution**:  DataService interface
- **estimated cost**: 5
- **created date**: 2025-10-25
- **status**: 🟢 recorded

## P2 - handle this quarter, improves experience

### DEBT-004: insufficient type annotation coverage
- **location**: utils/*.py
- **symptom**: IDE insufficient, type
- **impact**: developer experience
- **suggested solution**: add type annotations gradually
- **estimated cost**: 5
- **created date**: 2025-10-28
- **status**: 🟢 recorded

Develop a debt repayment plan

Q2 2025 Technical Debt Repayment Plan

April (focus on P0)

  • DEBT-001: ChartWidget split

    • Person in charge: milome
    • When: Weeks 1-3
    • Acceptance criteria:
      • Unit test coverage > 80%
      • Code review passed
      • Functional regression test passed
  • DEBT-002: Indicator incremental calculation

    • Person in charge: milome
    • Time: Week 4
    • Acceptance criteria:
      • Backtest performance < 5 seconds
      • The result is consistent with the full calculation

May (processing P1)

  • DEBT-003: Data layer decoupling
    • Person in charge: to be assigned
    • When: Weeks 1-2

June (reserved buffer)

  • DEBT-004: Type annotations (time permitting)
  • or deal with newly discovered P0/P1 debt

Part 5: Architecture Evolution Roadmap

The complete evolution timeline of micang-trader

The roadmap shows the evolution of micang-trader from prototype to maintenance period. Readers can focus on which of the most critical system constraints is solved at each stage, rather than stuffing later complexity into the prototype stage in advance.

Quantitative trading system architecture evolution roadmap
Figure 10: Architecture evolution roadmap showing the core constraints and governance priorities at each stage from prototype to maintenance.

Key decisions at each stage

Prototype stage → Growth stage:

  • Decision: Should I invest time in building the architecture, or continue with heap functionality?
  • Our choice: do the first refactoring at the right time
  • Result: Avoiding larger technical debt later on

Growth Phase → Performance Phase:

  • Decision: Prioritize performance optimization or feature development?
  • Our choice: Pause the feature for 2 weeks and do performance refactoring
  • Result: User satisfaction increased significantly

Performance period → Stability period:

  • Decision: Does the chart component need a complete rewrite?
  • Our choice: Split rather than rewrite (Strangler Fig pattern)
  • Result: Smooth transitions, no functional fallback

Stable period → Expansion period:

  • Decision: How to optimize chart performance issues? Is virtualization worth the investment?
  • Our choice: implement sliding window + virtualized rendering + off-screen caching
  • Result: Supports smooth display of 100,000+ K lines and reduces memory usage by 5.7x

Extension period → Maintenance period:

  • Decision-making: How to break through the Python GIL bottleneck?
  • Our choice: multi-process architecture separation, CPU-intensive task offload
  • Result: CPU utilization increased from 12.5% ​​to 85%, and metric calculation latency dropped from 150ms to 25ms

Maintenance Period:

  • Decision: How to prevent technical debt from accumulating again?
  • Our options: Set up a debt tracking mechanism and regular payoff plan
  • Result: Debt is controllable and development speed is stable

Architecture Decision Record (ADR)

ADR Template:

This diagram answers “How does ADR transform from a discussion into a chain of evidence that can be reviewed in the future?” Readers can understand it as a transaction log at the architectural level: issues, candidate solutions, final decisions, verification results and review triggering conditions must be tracked by subsequent maintainers.

Quantitative trading system ADR decision review sequence diagram
Figure 11: ADR decision review sequence, linking background, plan, consequences and review conditions into a closed loop.
# ADR-005: Switch indicator calculation from Pandas to incremental implementation

## status
- date: 2025-12-15
- status: Accepted
- decision maker: milome

## context
backtest performance became a serious bottleneck, 10,000 bars K need 180 seconds.
Profiling indicator calculation 80%.

## decision
implementation IncrementalIndicator interface, support bar Update, fullrecalculation.

## consequences

### positive
- 10x performance improvement(180s → 3s)
- 80% lower memory usage(1.2GB → 200MB)
- support unified real-time calculation and backtesting

### negative
- implementation complexity increases
- requires state maintenance and is harder to debug
- some indicators such as Bollinger Bands are harder to implement incrementally

## alternatives

|  | pros | cons | decision |
|------|------|------|------|
| Numba  | small change | improvement 3x, insufficient | ❌ |
| Cython  | best performance | high development cost and hard maintenance | ❌ |
| incremental calculation | balanced | moderate complexity | ✅ |

## References
- performance test report: docs/benchmarks/indicator-perf.md
- implementation code: core/indicators/incremental/

Part Six: Team and Culture Building

Resistance and coping strategies for reconstruction

Resistance 1: “As long as it can run, don’t move it”

symptom:

  • Although the code is bad, it “works”
  • Worry about refactoring introducing new bugs
  • Emotional attachment to existing code (“I wrote this”)

Coping strategies:

  1. Let data speak

    "This module produced 23 bugs in the past three months,
     accounting for 40% of all project bugs.
     After refactoring, it is expected to drop below 5."
  2. Small step verification

    • Refactor a small module first (1-2 days of work)
    • Demonstrate benefits (increased development speed, reduced bugs)
    • Gain the trust of your team before expanding your scope
  3. Establish a sense of security

    • Perfect testing
    • Clear rollback plan
    • Incremental replacement rather than big bang rewrite

Resistance 2: “I don’t have time to refactor”

symptom:

  • Products rush to launch new features
  • Consider refactoring to be “extra work”

Coping strategies:

  1. Refactoring is an investment, not a cost

    "Spend 5 days on refactoring and save 2 hours on each future change.
     With 4 changes per month, the investment pays back in 3 months."
  2. Allow 20% time for technical debt

    • Set aside 20% of time each sprint to work on technical debt
    • New feature estimates include “debt repayment time”
  3. Technical Debt Visualization

    • Regularly present the debt list to the product team
    • Explain the impact of debt on development speed

Resistance 3: “Refactoring is too risky”

symptom:

  • Worry about system instability caused by reconstruction
  • Fear of affecting online users

Coping strategies:

  1. Test first

    • Supplementary testing to 80% coverage before refactoring
    • Establish a performance baseline
  2. Grayscale release

    # use a feature flag  # illustrative code, not production code
    if feature_flags.enable_new_chart:
        render_with_new_code()
    else:
        render_with_old_code()
  3. Monitoring and Rollback

    • Key indicator monitoring (error rate, performance)
    • Exception automatic rollback mechanism

Code Review Checklist

Architectural level:

  • Whether to follow the single responsibility principle (a class/function only does one thing)
  • Whether module dependencies are reasonable (no circular dependencies)
  • Is the interface design clear (easy for the caller to understand)
  • Whether it introduces unnecessary complexity

Code level:

  • Whether necessary tests (unit tests, integration tests) have been added
  • Whether the documentation (docstring, README) has been updated
  • Whether new technical debt has been introduced (temporary solutions must have TODO)
  • Whether the code style complies with the specifications (lint, format)

Refactoring project:

  • Whether behavioral consistency is maintained (functionality unchanged)
  • Whether a performance baseline has been established (no performance rollback)
  • Is there a rollback plan (feature flag or branch)
  • Whether to implement it in phases (can be verified step by step)

Build a culture of refactoring

1. Refactoring as a routine

It’s not “wait until the code sucks before refactoring it”, it’s “clean it up when you see the bad smell”.

Boy Scout Rule:

“Leave the campground cleaner than you found it.”

Every time you submit code, make it a little better than before.

2. Technical Debt Meeting

Monthly 30-minute meeting:

  • A look back at debt discovered this month
  • Evaluate priorities
  • Assign repayment tasks for next month

3. Refactoring Sharing Session

Quarterly 1-hour sharing:

  • Share refactoring cases
  • Discuss challenges encountered
  • Summarize best practices

4. Incentive mechanism

  • Recognize refactoring contributions (not just feature development)
  • Incorporate code quality into performance reviews
  • Establish a “Cleanest Code Award”

Summary: Complete Checklist for Refactoring

Before refactoring

  • Clear goal: The refactoring goal is quantifiable (such as “split the file from 3000 lines to 4 files of 800 lines”)
  • Test Improvement: Current test coverage > 80%, all passed
  • ROI Assessment: Input-output ratio > 1.5
  • Rollback plan: Have a clear rollback strategy (Git/Feature Flag)
  • Team Ready: The team understands and supports refactoring
  • Time Reserve: Have enough time to complete without being interrupted by urgent needs

Under reconstruction

  • Small steps: < 100 lines per change, frequent submissions
  • Test Guard: Run tests immediately after each commit
  • Code Review: Critical changes require Code Review
  • Documentation Update: Synchronously update documents and comments
  • Performance Monitoring: Compare performance indicators before and after reconstruction

After reconstruction

  • Functional verification: 100% test passed, no function regression
  • Performance Verification: Achieve preset performance goals
  • Code Metrics: Reduced duplication, increased coverage, reduced complexity
  • Team Synchronization: Share refactoring experiences and best practices to the team
  • Debt Update: Update the technical debt list and mark paid items

Series Review and Outlook

The core ideas of the seven articles

table of contentscore themesKey takeaways
oneArchitecture designThe data layer is independent, the abstraction is unified in multiple periods, and the backtest real market is consistent.
twoPython practiceFloating point precision, time zone handling, memory management, concurrency safety
Second supplementPython pitfalls50 deep trap analysis, AI-assisted pit avoidance
threeAI engineeringSpecifications first, multi-agent collaboration, human-machine division of labor
FourPerformance optimizationprofiler positioning, algorithm optimization, compilation acceleration, caching strategy
fivetesting strategyAI-assisted TDD, attribute testing, boundary time testing
sixArchitecture evolutionFive refactoring records, decision-making framework, technical debt management, multi-process architecture

a core understanding

**The development of quantitative systems is not a one-time event, but a continuous evolution process. **

Good architecture is not designed, it evolves. The key is:

  • Make the right choice at every decision point (using the decision framework from earlier)
  • Pay off technical debt promptly (don’t accumulate it beyond your means)
  • Let the architecture grow with the business (the architecture serves the business, not the other way around)

Tips for readers

If you are a quantitative system developer:

  1. Don’t pursue a perfect initial architecture - let the system run first, and then gradually optimize it
  2. Build technical debt awareness - record debt, pay off regularly, and prevent accumulation
  3. Invest in Testing - Testing is a safety net for refactoring and a source of confidence
  4. Make good use of multiple processes - Python’s GIL is not a shackle, use it when you should use multiple processes

If you are a technical lead:

  1. Give the team time to refactor - Set aside 20% time for technical debt
  2. Establish a culture of refactoring - Recognize refactoring contributions, not just feature development
  3. Let data speak - ROI assessment, performance benchmarks, code metrics
  4. Architecture grows with business - from single process to multi-process, from single to distributed

Reference resources

books

  • “Refactoring: Improving the Design of Existing Code” (Martin Fowler)
  • Clean Architecture (Robert C. Martin)
  • “Code Encyclopedia” (Steve McConnell)
  • “release! Software Design and Deployment” (Michael T. Nygard)

Articles & Dissertations

  • “Technical Debt Quadrant” (Martin Fowler)
  • “The Boy Scout Rule”
  • “Strangler Fig Pattern”
  • “Architecture Decision Records”

tool

  • Code Analysis: SonarQube, CodeClimate, pylint, mypy
  • Test: pytest, coverage.py, hypothesis (attribute test)
  • Performance: cProfile, line_profiler, memory_profiler
  • Visualization: Mermaid, PlantUML

Series context

You are reading: Quantitative trading system development record

This is article 6 of 7. Reading progress is stored only in this browser so the full series page can resume from the right entry.

View full series →

Series Path

Current series chapters

Chapter clicks store reading progress only in this browser so the series page can resume from the right entry.

7 chapters
  1. Part 1 Previous in path Quantitative trading system development record (1): five key decisions in project startup and architecture design Taking Micang Trader as an example, this article starts from system boundaries, data flow, trading-session ownership, unified backtesting/live-trading interfaces, and AI collaboration boundaries to establish the architecture thread for the quantitative trading system series.
  2. Part 2 Previous in path Quantitative trading system development record (2): Python Pitfalls practical pitfall avoidance guide (1) Reorganize Python traps from a long list into an engineering risk reference for quantitative trading systems: how to amplify the three types of risks, syntax and scope, type and state, concurrency and state, into real trading system problems.
  3. Part 3 Previous in path Record of Quantitative Trading System Development (Part 3): Python Pitfalls Practical Pitfalls Avoidance Guide (Part 2) Continuing to reorganize Python risks into a reference piece: how GUI lifecycles, asynchronous network failures, security boundaries, and deployment infrastructure affect the long-term stability of quantitative trading systems.
  4. Part 4 Previous in path Quantitative trading system development record (4): test-driven agile development (AI Agent assistance) Starting from a cross-night trading day boundary bug, we reconstruct the test defense line of the quantitative trading system: defect-oriented testing pyramid, AI TDD division of labor, boundary time, data lineage and CI Gate.
  5. Part 5 Previous in path Quantitative trading system development record (5): Python performance tuning practice Transform performance optimization from empirical guesswork into a verifiable investigation process: start from the 3-second chart delay, locate the real bottleneck, compare optimization solutions, and establish benchmarks and rollback strategies.
  6. Part 6 Current Record of Quantitative Trading System Development (6): Architecture Evolution and Reconstruction Decisions Review the five refactorings of Micang Trader, explaining how the system evolved from the initial snapshot to a clearer target architecture, and incorporated technical debt and ADR decisions into long-term governance.
  7. Part 7 Quantitative trading system development record (7): AI engineering implementation - from speckit to BMAD Taking the trading calendar and daily aggregation requirements as a single case, explain how AI engineering can enter the delivery of real quantitative systems through specification drive, BMAD role handover and manual quality gate control.

Reading path

Continue along this topic path

Follow the recommended order for Quantitative system development practice instead of jumping through random articles in the same topic.

View full topic path →

Next step

Go deeper into this topic

If this article is useful, continue from the topic page or subscribe to follow later updates.

Return to topic Subscribe via RSS

RSS Subscribe

Subscribe to updates

Follow new articles in an RSS reader without checking the site manually.

Recommended readers include Follow , Feedly or Inoreader and other RSS readers.

Comments and discussion

Sign in with GitHub to join the discussion. Comments are synced to GitHub Discussions

Loading comments...