Commit Graph

41 Commits

Author SHA1 Message Date
JamesFlare1212
fb68c1ad5d refactor(scan): remove multi-thread scan logic, use sequential processing 2026-04-08 12:04:27 -04:00
JamesFlare1212
78c050a6fa refactor(s3): remove automatic image deletion, users manage S3 files 2026-04-08 10:29:27 -04:00
JamesFlare1212
1e234624fb fix(s3): URL mismatch 2026-04-08 00:00:44 -04:00
JamesFlare1212
6c58eacc8f fix(s3): racing condition and different URL in redis 2026-04-07 23:21:54 -04:00
JamesFlare1212
bbbd59be94 fix(s3): updating clean all files in s3 2026-04-07 22:45:10 -04:00
JamesFlare1212
ea9e9ec121 remove(proxy): remove warp-proxy 2026-04-07 18:19:13 -04:00
JamesFlare1212
0a133159e8 重构 scan: 实现多线程并发爬虫功能
- 新增 Semaphore 信号量类控制并发数
- 新增 BatchProcessor 批量处理器带进度回调
- 重构 initializeClubCache 和 updateStaleClubs 为并发模式
- 修复 Cookie 4xx 判断逻辑(仅 401/403 触发重新登录)
- 添加环境变量配置:CONCURRENT_API_CALLS 等
- 新增并发功能测试脚本 test-concurrency.ts

性能提升:从串行处理提升至可配置的并发处理(默认 8 线程)
修复问题:404 错误不再误判为认证失败
2026-04-07 18:18:18 -04:00
JamesFlare1212
fc98dbbbae fix(scan): remove p-limit 2026-04-07 08:38:15 -04:00
JamesFlare1212
af493446ac fix(redis): remove max mem size 2026-04-07 08:32:13 -04:00
JamesFlare1212
821df1c51f fix(scan): prevent exponential slowdown from event loop blocking
- Reduce default CONCURRENT_API_CALLS from 10 to 5 (Sharp AVIF is CPU-intensive)
- Create fresh p-limit instance per batch instead of module singleton
- Add garbage collection hint between batches
- Fix skippedCount tracking (was never incremented)
- Increase batch delay from 100ms to 500ms for event loop drainage
2026-04-07 07:35:48 -04:00
JamesFlare1212
573a9b3f4c fix(scan): batch processing and timeout reduction to prevent stall at 20%
- Process activities in batches of 100 instead of 5001 promises upfront
- Clear promise array after each batch to free memory (85MB→15MB peak)
- Reduce API timeout from 20s to 10s and retries from 3 to 2
- Total time per failed request: 63s→23s (63% faster failure)
- Expected total scan time: 8.5h→1.5h (82% faster)
2026-04-07 07:19:46 -04:00
JamesFlare1212
b426861b56 add(docker): extra hosts 2026-04-07 00:12:58 -04:00
JamesFlare1212
6fa6d83e91 clean up 2026-04-07 00:09:38 -04:00
JamesFlare1212
92b12a6a85 fix(scan): prevent progressive slowdown with mutex, batching, and connection pooling
- Add mutex to cron jobs to prevent overlapping runs
- Replace Promise.all with batched processing (50/batch) in updateStaleClubs
- Configure HTTP connection pooling with keep-alive (maxSockets: 50)
- Add memory monitoring to scan progress logs
- Reduce CONCURRENT_API_CALLS from 8 to 5 to reduce Sharp memory pressure
2026-04-07 00:00:56 -04:00
JamesFlare1212
eca0f1aec3 clean up 2026-04-06 23:11:13 -04:00
JamesFlare1212
f1967d5519 fix(cache): use Promise.allSettled to prevent hung promises from blocking scan
Root cause: Promise.all() waits for ALL promises, so a single hung/slow request
blocks the entire batch. With 5001 promises and 16 concurrent limit, timeouts
cause cascading delays that appear as 'scan stopped'.

Fix:
- Extract processSingleActivity() helper function
- Use Promise.allSettled() instead of Promise.all()
- Each promise handles its own success/error counting
- Prevents single hung promise from blocking entire scan

Impact: Scan should now complete all 5001 IDs without getting stuck
2026-04-06 21:48:10 -04:00
JamesFlare1212
5f630f8599 perf(api): optimize cookie validation with fail-fast strategy
Before: Pre-validate cookie before every request (2-4 API calls per activity)
After: Direct request, only validate on 4xx error (1-2 API calls per activity)

Changes:
- Remove pre-validation step in fetchActivityData
- Keep existing 4xx error handling with re-login logic
- Add debug log to track cookie usage

Impact: ~20-30% reduction in API calls for normal scenarios
Benefit: Faster scanning, less load on engage API
2026-04-06 21:37:06 -04:00
JamesFlare1212
32dee6b161 fix(cache): resolve scanning stop issue and add cache TTL management
- Fix Redis SCAN cursor type conversion (Buffer to String) to prevent early termination
- Add progress logging in initializeClubCache (every 100 activities with summary)
- Add Redis memory limits (512MB with LRU eviction policy)
- Implement cache TTL: 24h for normal data, 1h for error states (allows retry)
- Fix Docker permission issue by running app container as root
- Add TTL configuration to .env and example.env

Root cause: SCAN cursor comparison failed due to type mismatch (Buffer vs String)
Impact: Scanning now processes all 5000+ IDs instead of stopping at ~300
2026-04-06 21:03:30 -04:00
JamesFlare1212
ee8cccc755 chore(docker): limit app container log size to 15MB with 3 file rotation 2026-04-06 18:47:31 -04:00
JamesFlare1212
02e0e6cafe chore(docker): limit app container log size to 15MB with 3 file rotation 2026-04-06 18:45:20 -04:00
JamesFlare1212
0b9a42c7f3 fix(auth): add login lock to prevent concurrent Playwright login attempts 2026-04-06 18:25:52 -04:00
JamesFlare1212
480ba14688 fix(warp-proxy): host.docker.internal 2026-04-06 18:19:48 -04:00
JamesFlare1212
352e32d38b test: 验证代理功能并完善文档
测试结果:
-  Warp proxy 服务启动成功
-  SOCKS5 代理工作正常 (warp=on)
-  HTTP 代理工作正常
-  Playwright + Proxy 集成成功
- ⚠️ 发现 DNS 解析问题,建议用 IP 地址

文档更新:
- PROXY-TESTING.md: 完整的测试报告和故障排除
- 包含测试脚本和最佳实践
2026-04-06 17:06:43 -04:00
JamesFlare1212
d0a0abed68 update: warp-proxy docker-compose.yaml 2026-04-06 16:43:10 -04:00
JamesFlare1212
4a97057825 feat: 添加可选的代理功能支持
新增功能:
- 集成 Cloudflare WARP socks5 代理服务
- 通过环境变量 USE_PROXY 控制代理开关
- 支持自定义 HTTP/HTTPS/SOCKS5 代理服务器
- 使用 docker compose profile 管理 proxy 服务

配置方式:
- USE_PROXY=true 启用代理
- ALL_PROXY/HTTP_PROXY/HTTPS_PROXY 自定义代理
- docker compose --profile proxy up 启动 warp 服务

文件变更:
- docker-compose.yaml: 添加 warp-proxy 服务
- playwright-auth.ts: 添加代理配置逻辑
- example.env: 添加代理环境变量
- PROXY.md: 使用文档
2026-04-06 16:37:54 -04:00
JamesFlare1212
4e04063469 fix: 将 playwright 移到 production dependencies
问题:Docker 构建时使用 --production 标志,导致 playwright
无法找到

修复:将 @playwright/test 从 devDependencies 移到 dependencies
2026-04-06 16:18:27 -04:00
JamesFlare1212
a21806dfca remove: playwright-report 2026-04-06 16:10:15 -04:00
JamesFlare1212
a8f468a497 feat: 使用 Playwright 实现自动化 cookie 获取和验证
主要变更:
- 新增 Playwright 登录认证服务 (services/playwright-auth.ts)
- 重构 get-activity.ts 使用 Playwright 替代 Axios 登录
- 实现自动 cookie 过期检测和重试机制
- 优化 Docker 配置支持 Playwright 浏览器运行
- 添加启动脚本自动验证和刷新 cookies
- 完善错误处理:区分 4xx(认证失败) 和 5xx(服务器错误)

技术细节:
- 删除旧版 login_template.txt 和 nkcs-engage.cookie.txt
- 添加 startup.sh 启动时自动验证 cookies
- 改进 cookie 验证逻辑,添加指数退避重试
- Dockerfile 安装 Playwright 系统依赖
- docker-compose.yaml 添加 volumes 和 health checks

测试:
- 添加 auth.spec.ts 自动化测试
- 添加 get-cookies.ts 和 test-cookies-validity.ts 工具脚本
- 验证 401/500/000 等错误场景处理正确
2026-04-06 16:05:38 -04:00
JamesFlare1212
b18b8a85e0 feat new s3 public url option 2026-03-15 19:40:59 -04:00
JamesFlare1212
cb7f99dc09 update engage login template 2025-12-09 19:31:32 +08:00
6ae25329e9 Update engage-api/login_template.txt 2025-08-29 09:48:08 +02:00
JamesFlare1212
bd11e5971c improve: rank academicYear in descending order 2025-05-14 15:09:17 -04:00
JamesFlare1212
d81078c62d feat: endpoint /v1/activity/list?isStudentLed={true/false} 2025-05-14 00:30:43 -04:00
JamesFlare1212
2db16d5e80 update: redis 8.0 2025-05-13 23:24:09 -04:00
JamesFlare1212
7ba5f8f00f feat: skip duplicate image on remote 2025-05-12 23:46:25 -04:00
JamesFlare1212
8598571f72 improve: code structure 2025-05-12 21:45:57 -04:00
JamesFlare1212
2100bd04ca improve: semesterCost and poorWeatherPlan format 2025-05-12 20:37:26 -04:00
JamesFlare1212
8136c76d46 feat: convert image into .avif 2025-05-12 18:41:15 -04:00
JamesFlare1212
1996b1e29c feat: new filters on /v1/activity/list 2025-05-12 16:32:56 -04:00
JamesFlare1212
1d1d82fa60 feat: new api endpoint, /v1/activity/category and /v1/activity/academicYear 2025-05-12 01:38:25 -04:00
JamesFlare1212
2543e56ec4 init: port to typescript and bun 2025-05-10 23:39:39 -04:00