Every night, raw price data from global markets travels through seven acts — arriving as noise, leaving as clarity.
Full extraction-to-consumption pipeline running on a Raspberry Pi 5 cluster. Raw market data flows from 4 Docker extraction nodes through PostgreSQL, a dbt staging view, and a fully materialised analytics table — consumed by Power BI, a .NET 10 MVC app, and a local NL/SQL pipeline. Refresh time reduced from 60 min → 35 min via CTE collapse and work_mem tuning.
PostgreSQL NaN ≠ IEEE 754 NaN.
Power BI raised DataSource.Error: Numeric NaN not supported by System.Decimal on
window-function computed columns. PostgreSQL float8 NaN satisfies
NaN = NaN → TRUE, so the IEEE 754 self-inequality guard
(value != value) silently passes NaN through. Fix: test with
value = 'NaN'::float8 in stg_stocks_cleansed, and apply
safe_numeric() macro to every numeric(18,x) cast in
stocks_analytics to catch NaN propagating through lag/lead arithmetic.
PostgreSQL 12+ treats each CTE as an optimisation fence — fully materialising it
to disk before the next one reads it. The original view had 5 CTEs, causing 5 full
passes over 70M rows. Collapsing stock_data + ema_calc + macd_line_calc into a single
core_calcs CTE eliminated 2 redundant sort cycles. Combined with
SET LOCAL work_mem = 512MB in the dbt pre_hook, this cut wall-clock time by 42%.
0 1 * * *run_stocks.sh which orchestrates extraction across all nodes and calls run_dbt.sh as its final step. All output appends to $LOG./proc/sys/kernel/random/uuid to avoid provider fingerprinting. Writes to public.integrated_stocks.{{ source('public', ...) }} references. integrated_stocks holds 70M+ OHLCV rows across global tickers. stocks_info is the sector/country dimension lookup. sp500_tickers drives the S&P 500 inclusion filter.stg_stocksrow_number() OVER (PARTITION BY ticker, date ORDER BY open) to eliminate 35,582 duplicate ticker/date pairs. Removes 23.5M ghost rows from downstream.stg_stocks_cleansedvalue = 'NaN'::float8 — not value != value.NULL0safe_numeric() macro applies the same guard to every numeric(18,x) cast in stocks_analytics.stocks_analyticsPARTITION BY ticker ORDER BY date enabling a single sort pass. SET LOCAL work_mem = '512MB' via pre_hook. Post_hook runs ANALYZE. Unique index on (ticker, date).PBIEgwService every 5 min if down.dbt_stocks.stocks_analytics in PostgreSQL live. Always reflects the latest dbt run.