Full extraction-to-consumption pipeline running on a Raspberry Pi 5 cluster. Raw market data flows from 4 Docker extraction nodes through PostgreSQL, a dbt staging view, and a fully materialised analytics table — consumed by Power BI, a .NET 10 MVC app, and a local NL/SQL pipeline. Refresh time reduced from 60 min → 35 min via CTE collapse and work_mem tuning.
PostgreSQL NaN ≠ IEEE 754 NaN.
Power BI raised DataSource.Error: Numeric NaN not supported by System.Decimal on
window-function computed columns. PostgreSQL float8 NaN satisfies
NaN = NaN → TRUE, so the IEEE 754 self-inequality guard
(value != value) silently passes NaN through. Fix: test with
value = 'NaN'::float8 in stg_stocks_cleansed, and apply
safe_numeric() macro to every numeric(18,x) cast in
stocks_analytics to catch NaN propagating through lag/lead arithmetic.
PostgreSQL 12+ treats each CTE as an optimisation fence — fully materialising it
to disk before the next one reads it. The original view had 5 CTEs, causing 5 full
passes over 70M rows. Collapsing stock_data + ema_calc + macd_line_calc into a single
core_calcs CTE eliminated 2 redundant sort cycles. Combined with
SET LOCAL work_mem = 512MB in the dbt pre_hook, this cut wall-clock time by 42%.
0 1 * * *run_stocks.sh which orchestrates extraction across all nodes and calls
run_dbt.sh as its final step. All output appends to $LOG.
/proc/sys/kernel/random/uuid to avoid
provider fingerprinting. Writes to public.integrated_stocks.
{{ source('public', ...) }} references.
integrated_stocks holds 70M+ OHLCV rows across global tickers.
stocks_info is the sector/country dimension lookup.
sp500_tickers drives the S&P 500 inclusion filter.
stg_stocksrow_number() OVER (PARTITION BY ticker, date ORDER BY open) to eliminate
35,582 duplicate ticker/date pairs caused by float precision artifacts in
split-adjusted OTC tickers. Removes 23.5M ghost rows from downstream.
stg_stocks_cleansedfloat8 NaN satisfies NaN = NaN → TRUE — the
IEEE 754 self-inequality trick (value != value) does not work.
Must use value = 'NaN'::float8.NULL0NULLsafe_numeric() macro applies the same guard to every numeric(18,x)
cast in stocks_analytics, catching NaN that propagates through lag/lead window
arithmetic on historical rows.
stocks_analyticsPARTITION BY ticker ORDER BY date
enabling a single sort pass in core_calcs. SET LOCAL work_mem = '512MB'
via pre_hook eliminates disk sort spills. Post_hook runs ANALYZE to update
planner statistics. Unique index on (ticker, date).
PBIEgwService every 5 min if down. Windows Update active hours configured
to protect 1am dbt run window.
stocks_analytics directly.
dbt_stocks.stocks_analytics in PostgreSQL live.
Always reflects the latest dbt run. No stale data for today's rows.