Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CRASH] Accounting ctx free in free_cell context #3498

Open
vladpaiu opened this issue Oct 18, 2024 · 2 comments
Open

[CRASH] Accounting ctx free in free_cell context #3498

vladpaiu opened this issue Oct 18, 2024 · 2 comments
Labels
Milestone

Comments

@vladpaiu
Copy link
Member

OpenSIPS version you are running

3.4.5

Crash Core Dump

#0  0x00007f1371367de7 in dlg_ctx_get_ptr (dlg=0x70616d7074723d61, pos=0) at dlg_ctx.c:65
#1  0x00007f136e6994b3 in free_acc_ctx (ctx=0x7f1372733868) at acc_logic.c:175
#2  unref_acc_ctx (ctx=0x7f1372733868) at acc_logic.c:1219
#3  0x00007f1371fe617c in empty_tmcb_list (head=head@entry=0x7f1372734288) at t_hooks.c:53
#4  0x00007f1371fb81e2 in free_cell (dead_cell=0x7f1372734218) at h_table.c:127
#5  0x00007f137200b312 in wait_handler (wait_tl=0x7f1372734298) at timer.c:478
#6  timer_routine (ticks=<optimized out>, set=<optimized out>) at timer.c:1115
#7  0x000055ca5befe3a6 in handle_timer_job () at timer.c:1018
#8  0x000055ca5c061d60 in handle_io (idx=<optimized out>, event_type=<optimized out>, fm=<optimized out>) at net/net_udp.c:299
#9  io_wait_loop_epoll (repeat=<optimized out>, t=<optimized out>, h=<optimized out>) at net/../io_wait_loop.h:311
#10 0x000055ca5c0675b1 in udp_start_processes (chd_rank=chd_rank@entry=0x55ca5c1baff8 <chd_rank>, startup_done=startup_done@entry=0x0) at net/net_udp.c:528
#11 0x000055ca5bdcb430 in main_loop () at main.c:237

(gdb) f 1
#1  0x00007f136e6994b3 in free_acc_ctx (ctx=0x7f1372733868) at acc_logic.c:175
175	acc_logic.c: No such file or directory.
(gdb) p T
$1 = (struct cell *) 0x7f1372741818
(gdb) p T->dialog_ctx
$2 = (void *) 0x70616d7074723d61

Note that there exists a dangling T pointer ( probably pointing to an already de-allocated transaction )

Describe the traffic that generated the bug
Unknown

To Reproduce
Unknown

Relevant System Logs
None

OS/environment information
Debian 11.10, installed from official OpenSIPS repo.

Additional context
OpenSIPS running without B2B, generating ACC ( cdrs | failed ) with dialog context , doing push notifications via manual notify_on_event and running local_route for various script processing.

@vladpaiu
Copy link
Member Author

collected debug logs for this, the crash happens inside a process that never handles SIP ( ie. a TCP worker on a non-TCP receiving server ).

Process handles an MI termination of early dialog, looks up the T and sets it ( leaking it now )

Oct 22 07:10:20 [56] DBG:tm:t_unref_cell: UNREF_UNSAFE: [0x7f8be987bee8] after is 0
Oct 22 07:10:20 [56] DBG:tm:t_lookup_ident: transaction found
Oct 22 07:10:20 [56] DBG:tm:t_lookup_ident: REF_UNSAFE:[0x7f8be987bee8] after is 1
Oct 22 07:10:20 [56] DBG:dialog:dlg_end_dlg: trying to find transaction with hash_index = 19996 and label = 396874499
Oct 22 07:10:20 [56] DBG:dialog:init_dlg_term_reason: Setting DLG term reason to [MI Termination] 

Later, process runs a TMCB_TRANS_DELETED :

Oct 22 07:18:13 [56] DBG:tm:run_any_trans_callbacks: trans=0x7f8be987bee8, callback type 4096, id 4 entered
Oct 22 07:18:13 [56] DBG:tm:delete_cell: delete transaction 0x7f8be987bee8
Oct 22 07:18:13 [56] DBG:tm:wait_handler: removing 0x7f8be987bee8 from table

some 'strangeness' in https://github.com/OpenSIPS/opensips/blob/master/modules/tm/t_hooks.c#L225 , this function does not do set_t with the new transaction, but only restores it.

Finally when acc cleanup happens in the same process context, the crash occurs :

(gdb) bt
#0  0x00007f8be8081de7 in dlg_ctx_get_ptr (dlg=0x687400000000002c, pos=0) at dlg_ctx.c:65
#1  0x00007f8be53ba4b3 in free_acc_ctx (ctx=0x7f8be9870ef8) at acc_logic.c:175
#2  unref_acc_ctx (ctx=0x7f8be9870ef8) at acc_logic.c:1219
#3  0x000056245bb11398 in context_destroy (ctxtype=ctxtype@entry=CONTEXT_DIALOG, ctx=ctx@entry=0x7f8be98686b8) at context.c:111
#4  0x00007f8be80bf8ec in free_dlg_dlg (dlg=dlg@entry=0x7f8be9868590) at dlg_hash.c:174
#5  0x00007f8be80c3883 in destroy_dlg (dlg=dlg@entry=0x7f8be9868590) at dlg_hash.c:271
#6  0x00007f8be80d1e37 in _unref_dlg (dlg=0x7f8be9868590, cnt=<optimized out>) at dlg_hash.c:1040
#7  0x00007f8be8d0017c in empty_tmcb_list (head=head@entry=0x7f8be9886c88) at t_hooks.c:53
#8  0x00007f8be8cd21e2 in free_cell (dead_cell=0x7f8be9886c18) at h_table.c:127

(gdb) f 1
#1  0x00007f8be53ba4b3 in free_acc_ctx (ctx=0x7f8be9870ef8) at acc_logic.c:175
175	acc_logic.c: No such file or directory.
(gdb) p T
$1 = (struct cell *) 0x7f8be987bee8

at this point, unsure if #3500 is the right fix. Should the dialog module cleanup the T after it manually looks it up in https://github.com/OpenSIPS/opensips/blob/master/modules/dialog/dlg_req_within.c#L475 ?

@vladpaiu
Copy link
Member Author

A bit different example this time, of a non-early terminated dialog :

#2  unref_acc_ctx (ctx=0x7f4a3c8ba138) at acc_logic.c:1219
#3  0x00005626bc50e398 in context_destroy (ctxtype=ctxtype@entry=CONTEXT_TRAN, ctx=ctx@entry=0x7f4a3c8b8e98) at context.c:111
#4  0x00007f4a3c1a71f3 in free_cell (dead_cell=0x7f4a3c8b7450) at h_table.c:129
#5  0x00007f4a3c1fa312 in wait_handler (wait_tl=0x7f4a3c8b74d0) at timer.c:478
#6  timer_routine (ticks=<optimized out>, set=<optimized out>) at timer.c:1115
#7  0x00005626bc6183a6 in handle_timer_job () at timer.c:1018
#8  0x00005626bc7738cd in handle_io (fm=0x7f4a3e3a2918, idx=3, event_type=1) at net/net_tcp_proc.c:204
#9  0x00005626bc774d45 in io_wait_loop_epoll (h=<optimized out>, t=<optimized out>, repeat=<optimized out>) at net/../io_wait_loop.h:305
#10 tcp_worker_proc_loop () at net/net_tcp_proc.c:442
#11 0x00005626bc76e3ce in tcp_start_processes (chd_rank=chd_rank@entry=0x5626bc8d4ff8 <chd_rank>, startup_done=startup_done@entry=0x0) at net/net_tcp.c:2119
#12 0x00005626bc4e5447 in main_loop () at main.c:243
#13 main (argc=<optimized out>, argv=<optimized out>) at main.c:966
(gdb) 
#0  0x00007f4a3b556de7 in dlg_ctx_get_ptr (dlg=0x900, pos=0) at dlg_ctx.c:65
#1  0x00007f4a388884b3 in free_acc_ctx (ctx=0x7f4a3c8ba138) at acc_logic.c:175
#2  unref_acc_ctx (ctx=0x7f4a3c8ba138) at acc_logic.c:1219
#3  0x00005626bc50e398 in context_destroy (ctxtype=ctxtype@entry=CONTEXT_TRAN, ctx=ctx@entry=0x7f4a3c8b8e98) at context.c:111
#4  0x00007f4a3c1a71f3 in free_cell (dead_cell=0x7f4a3c8b7450) at h_table.c:129
#5  0x00007f4a3c1fa312 in wait_handler (wait_tl=0x7f4a3c8b74d0) at timer.c:478

(gdb) p *(struct dlg_cell *)dead_cell->dialog_ctx
$3 = {ref = 0, next = 0x0, prev = 0x0, h_id = 1604530918, h_entry = 2325, state = 5, lifetime = 21600, lifetime_dirty = 0, locked_by = 0, start_ts = 0, flags = 90392, 
  from_rr_nb = 0,

we have a ref 0 dialog, so not even sure if it's save to address it from the ACC context..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant