Skip to content

[feature](routine-load) Add last task schedule time to routine load jobs#65166

Open
0AyanamiRei wants to merge 9 commits into
apache:masterfrom
0AyanamiRei:feature/routine-load-last-task-schedule-time
Open

[feature](routine-load) Add last task schedule time to routine load jobs#65166
0AyanamiRei wants to merge 9 commits into
apache:masterfrom
0AyanamiRei:feature/routine-load-last-task-schedule-time

Conversation

@0AyanamiRei

@0AyanamiRei 0AyanamiRei commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: N/A

Related PR: N/A

Problem Summary:

Add LAST_TASK_SCHEDULE_TIME to information_schema.routine_load_jobs so users can see the latest valid routine load task scheduling time at the job level. The scheduler records the timestamp only after confirming the task still belongs to the job, and the information schema path exposes it through FE thrift and the BE schema scanner. The regression keeps the abnormal-pause system table coverage and verifies the new field with a real Kafka-backed scheduled routine load job.

Example:

SHOW COLUMNS FROM information_schema.routine_load_jobs LIKE 'LAST_TASK_SCHEDULE_TIME';

SELECT JOB_NAME, LAST_TASK_SCHEDULE_TIME
FROM information_schema.routine_load_jobs
WHERE JOB_NAME = '<routine_load_job_name>';

Example output:

+--------------------------+-------------------------+
| JOB_NAME                 | LAST_TASK_SCHEDULE_TIME |
+--------------------------+-------------------------+
| test_routine_load_job    | 2026-07-02 19:58:18     |
+--------------------------+-------------------------+

Release note

Add LAST_TASK_SCHEDULE_TIME to information_schema.routine_load_jobs.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes. information_schema.routine_load_jobs now exposes LAST_TASK_SCHEDULE_TIME.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

0AyanamiRei and others added 5 commits July 2, 2026 17:16
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
### What problem does this PR solve?

Issue Number: N/A

Related PR: N/A

Problem Summary: Routine load job system table support added LAST_TASK_SCHEDULE_TIME to the FE schema and thrift struct, but the BE information schema scanner still did not expose or fill the new column. The scheduler also updated the job-level timestamp before confirming the task still belonged to the job. This change wires the BE scanner to the new thrift field, updates the job-level timestamp after task validity is checked, and extends the routine load system table regression case to query the new column.

### Release note

Add LAST_TASK_SCHEDULE_TIME to information_schema.routine_load_jobs.

### Check List (For Author)

- Test:
    - Manual test: ./build-support/check-format.sh
    - Manual test: git diff --check
    - Regression test: Not run (requires local Doris and Kafka cluster; to be run in final validation)
- Behavior changed: Yes. information_schema.routine_load_jobs exposes LAST_TASK_SCHEDULE_TIME.
- Does this need documentation: No
@0AyanamiRei

Copy link
Copy Markdown
Contributor Author

run buildall

0AyanamiRei and others added 3 commits July 2, 2026 20:01
### What problem does this PR solve?

Issue Number: N/A

Related PR: apache#65166

Problem Summary: The routine load system table regression checked LAST_TASK_SCHEDULE_TIME on an invalid Kafka topic path. That job can pause while refreshing Kafka partitions before any task reaches RoutineLoadTaskScheduler, so the job-level task schedule time is expected to stay empty. This change keeps the abnormal-pause system table coverage, adds a real Kafka topic and scheduled routine load job for the LAST_TASK_SCHEDULE_TIME assertion, and marks the job-level timestamp field as transient to make its runtime-only semantics explicit.

### Release note

None

### Check List (For Author)

- Test:
    - Manual test: git diff --check
    - Manual test: SHOW COLUMNS FROM information_schema.routine_load_jobs LIKE 'LAST_TASK_SCHEDULE_TIME' and SELECT JOB_NAME, LAST_TASK_SCHEDULE_TIME for a scheduled routine load job
    - Regression test: TMPDIR=/data/data3/huangruixin/tmp/codex-build ./run-regression-test.sh --run -d load_p0/routine_load -s test_routine_load_job_info_system_table
- Behavior changed: No
- Does this need documentation: No
Support LastTaskScheduleTime field in SHOW ROUTINE LOAD command
to be consistent with information_schema.routine_load_jobs system table.

The new field is appended as the last column after ComputeGroup,
using the same getLastTaskScheduleTimeString() method to ensure
consistent semantics and display format.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…cheduleTime

Add regression test to verify:
1. SHOW ROUTINE LOAD includes LastTaskScheduleTime as the last column
2. The value is non-empty for scheduled jobs
3. The value is consistent between SHOW ROUTINE LOAD and information_schema.routine_load_jobs

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@0AyanamiRei

Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 30085 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 4cc29400de54f112db8f3a2d7e2ead883288b8f6, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17734	4143	4076	4076
q2	2007	318	201	201
q3	10311	1448	837	837
q4	4687	477	339	339
q5	7504	879	608	608
q6	188	171	137	137
q7	785	837	632	632
q8	9371	1728	1700	1700
q9	5694	4486	4468	4468
q10	6810	1813	1538	1538
q11	496	332	325	325
q12	702	560	450	450
q13	18118	3487	2753	2753
q14	272	268	259	259
q15	q16	795	786	714	714
q17	1036	986	932	932
q18	6862	5792	5636	5636
q19	1189	1203	1033	1033
q20	775	691	576	576
q21	5668	2773	2567	2567
q22	440	370	304	304
Total cold run time: 101444 ms
Total hot run time: 30085 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4404	4322	4323	4322
q2	295	311	211	211
q3	4664	4989	4410	4410
q4	2078	2161	1380	1380
q5	4474	4316	4359	4316
q6	231	179	128	128
q7	1737	2255	1787	1787
q8	2615	2214	2198	2198
q9	8180	8085	7782	7782
q10	4750	4776	4307	4307
q11	594	442	380	380
q12	755	773	557	557
q13	3336	3609	2954	2954
q14	305	308	278	278
q15	q16	728	765	644	644
q17	1395	1311	1321	1311
q18	7978	7391	7600	7391
q19	1230	1117	1167	1117
q20	2215	2211	1956	1956
q21	5301	4598	4486	4486
q22	531	463	401	401
Total cold run time: 57796 ms
Total hot run time: 52316 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 174515 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 4cc29400de54f112db8f3a2d7e2ead883288b8f6, data reload: false

query5	4335	627	501	501
query6	452	219	207	207
query7	4834	607	349	349
query8	338	198	190	190
query9	8757	4092	4100	4092
query10	471	374	305	305
query11	5877	2374	2187	2187
query12	171	100	102	100
query13	1273	626	429	429
query14	6290	5323	4996	4996
query14_1	4340	4328	4327	4327
query15	213	205	182	182
query16	1060	502	457	457
query17	1148	743	600	600
query18	2635	493	362	362
query19	214	199	160	160
query20	116	113	110	110
query21	242	159	133	133
query22	13704	13616	13392	13392
query23	17569	16518	16369	16369
query23_1	16385	16347	16314	16314
query24	7554	1780	1339	1339
query24_1	1344	1312	1325	1312
query25	577	490	401	401
query26	1346	357	214	214
query27	2580	621	379	379
query28	4491	2037	2066	2037
query29	1102	643	550	550
query30	347	255	228	228
query31	1106	1099	973	973
query32	114	61	59	59
query33	511	311	246	246
query34	1206	1156	607	607
query35	776	784	667	667
query36	1369	1404	1236	1236
query37	154	109	87	87
query38	1877	1714	1655	1655
query39	945	918	903	903
query39_1	888	895	909	895
query40	241	164	140	140
query41	65	62	62	62
query42	98	94	92	92
query43	324	326	282	282
query44	1433	794	785	785
query45	207	182	181	181
query46	1084	1225	744	744
query47	2364	2348	2254	2254
query48	410	420	297	297
query49	587	426	304	304
query50	1070	429	334	334
query51	4458	4359	4323	4323
query52	84	86	77	77
query53	265	278	210	210
query54	275	231	212	212
query55	74	70	66	66
query56	316	283	280	280
query57	1467	1441	1328	1328
query58	273	244	259	244
query59	1552	1675	1413	1413
query60	301	266	257	257
query61	149	177	149	149
query62	688	649	594	594
query63	243	205	213	205
query64	2501	794	625	625
query65	4892	4858	4764	4764
query66	1806	514	388	388
query67	29701	29534	29484	29484
query68	3225	1570	1030	1030
query69	410	303	278	278
query70	1101	913	966	913
query71	352	317	309	309
query72	2918	2671	2364	2364
query73	811	797	458	458
query74	5122	4988	4820	4820
query75	2614	2569	2225	2225
query76	2310	1163	779	779
query77	348	374	292	292
query78	12455	12478	11945	11945
query79	1440	1179	764	764
query80	1285	541	450	450
query81	551	329	283	283
query82	561	163	118	118
query83	388	317	290	290
query84	282	159	130	130
query85	966	650	520	520
query86	421	304	292	292
query87	1828	1824	1768	1768
query88	3748	2819	2822	2819
query89	457	409	362	362
query90	1907	201	202	201
query91	206	191	161	161
query92	63	61	58	58
query93	1660	1535	1014	1014
query94	718	354	345	345
query95	772	494	476	476
query96	999	782	352	352
query97	2693	2713	2598	2598
query98	219	207	196	196
query99	1174	1143	1020	1020
Total cold run time: 260122 ms
Total hot run time: 174515 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
ClickBench: Total hot run time: 25.34 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 4cc29400de54f112db8f3a2d7e2ead883288b8f6, data reload: false

query1	0.00	0.00	0.00
query2	0.09	0.05	0.05
query3	0.26	0.14	0.13
query4	1.62	0.14	0.14
query5	0.23	0.23	0.22
query6	1.26	1.06	1.04
query7	0.04	0.01	0.01
query8	0.06	0.04	0.03
query9	0.37	0.31	0.30
query10	0.56	0.58	0.55
query11	0.20	0.14	0.14
query12	0.18	0.15	0.15
query13	0.46	0.47	0.48
query14	1.00	1.01	1.00
query15	0.60	0.59	0.60
query16	0.34	0.32	0.31
query17	1.12	1.09	1.12
query18	0.22	0.22	0.21
query19	2.13	1.96	1.99
query20	0.02	0.01	0.01
query21	15.46	0.18	0.13
query22	5.03	0.06	0.05
query23	16.16	0.30	0.13
query24	3.06	0.42	0.31
query25	0.12	0.05	0.04
query26	0.77	0.23	0.15
query27	0.04	0.04	0.03
query28	3.57	0.99	0.51
query29	12.48	4.32	3.47
query30	0.28	0.16	0.16
query31	2.78	0.60	0.32
query32	3.24	0.60	0.48
query33	3.18	3.28	3.26
query34	15.63	4.22	3.54
query35	3.53	3.55	3.55
query36	0.56	0.44	0.43
query37	0.09	0.07	0.07
query38	0.05	0.04	0.04
query39	0.04	0.02	0.03
query40	0.18	0.17	0.16
query41	0.09	0.04	0.03
query42	0.04	0.03	0.03
query43	0.05	0.04	0.03
Total cold run time: 97.19 s
Total hot run time: 25.34 s

@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 62.50% (10/16) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (5/5) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 72.90% (28990/39768)
Line Coverage 56.46% (313327/554967)
Region Coverage 53.06% (261355/492593)
Branch Coverage 54.04% (114515/211910)

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 13.56% (16/118) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants