Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reformulated neighboring codes, introducing full wavelet transformation and image denoising #64

Merged
merged 32 commits into from
Sep 30, 2024

Conversation

Arcadia197
Copy link
Collaborator

Reformulated neighboring codes

Previously, the code was only able to handle one neighbor for each ghost layer patch. However, for the full wavelet transformation several neighbors on different levels for the same patch will be needed. A new neighbor formulation enables the coexistence of neighbors on different levels. For all 56 possible patches in 3D we have now 3x56 possible neighbor relations. 2D and 3D codes have been merged to avoid confusion of neighbor codes. Taken from the code, the neighbor formulations are now constructed the following way:

!> neighbor codes: \n
!  ---------------
!>   1- 56 : lvl_diff =  0  (same level)
!>  57-112 : lvl_diff =  1  (coarser neighbor)
!> 113-168 : lvl_diff = -1  (finer   neighbor)
!> For each range, the different 56 entries are:
!> 01-08 : X side (4-,4 )
!> 09-16 : Y-side (4-,4 )
!> 17-24 : Z-side (4-,4 )
!> 25-32 : X-Y edge (2--, 2 -, 2- , 2  )
!> 33-40 : X-Z edge (2--, 2 -, 2- , 2  )
!> 41-48 : Y-Z edge (2--, 2 -, 2- , 2  )
!> 49-56 : corners (---,  --, - -,   -, -- ,  - , -  ,    )

As one can see, sides have 4 possible codes for the 4 possible configurations of blocks on finer level, edges have 2 and corners 1 possible configurations. 2D codes onlz have X- and Y-side with 2 configurations and the X-Y edges with 1 configuration each.

With the change in code formulation the concerning code to compute the boundaries has been revisited and greatly simplified. It should hopefully be clear enough now.

Full wavelet transformation - adapt_tree_cvs

The full wavelet transformation on all levels has been implemented. It works by doing the following steps:

  1. Initialize the full grid from the leaf-layer down to level JMin. New blocks are marked as empty with the refinement flag.
  2. Do the full wavelet decomposition. We work downwards from the leaf-layer. Along the way, interior blocks (blocks which are neither leaf-blocks nor root-blocks on level JMin) can only be decomposed if they have correctly synched blocks from all neighbors on the same level. This way we avoid the need of any coarse extension on blocks besides the leaf-layer blocks. It also effectively creates an iterative decomposition algorithm that starts on the leaf-layer and for later steps mostly works level-wise down to JMin due to restrictions of blocks waiting for their medium-lvl neighbors. Luckily, as the grid information is already present, only synching block values and lgt-data is needed for each step.
  3. Adaption of grid is done only once after the full wavelet transformation was completed. With this, mask and norm information is only computed once. All unsignificant blocks can then be deleted
  4. Full wavelet reconstruction is applied, we work level-wise from lowest layer to the leaf-layer and update the SC of all daughters along the way, afterwards all non-leaf blocks are deleted as they are currently not used for further steps

This algorithm has the following benefits:

  • Algorithm structure is clear and separated, each step can be found in sub-functions in the code and individually adapted
  • Decluttering the wavelet decomposition into three steps (grid creation, decomposition and grid adaption) greatly reduces the amount of grid information that need to be updated, in total, the active lists and neighbor formulations are only computed twice!
  • With the full wavelet decomposition we have more control over the CE and can eradicate it for interior blocks which are passed along the way (blocks which are not leaf-blocks but also will not be leaf-blocks after adaption, so which are only needed temporarily)
  • The full wavelet reconstruction reduces the block minimum by removing all restrictions previously introduced from the CE. This means 12 for CDF42, 16 for CDF44 and CDF62 are again possible. CDF66 with minimal block size of 24 was the minimal block size for CDF62 before!

Image denoising

The iterative algorithm by Azzalini was implemented in order to compute image denoising. This was introduced as a new step after full wavelet decomposition and before the grid adaption. Possible, CVS iterative algorithm will be introduced here as well but is not yet tested. A new post-processing tool (wabbit-post --denoise) deploys the denoising algorithm on wabbit files of uniform grid to be denoised and a test was added to check the results. Tests show that the denoising algorithm works quite well for the bi-orthogonal wavelets.

Miscellaneous changes

  • Cosmetic changes to output so they are a bit more beautiful. I work with this stuff daily so some more lines or aligning of output brightens my day
  • merged some 2D / 3D formulations into one single code to reduce amount of redundant code
  • option to dump block values and neighbor infos which are usefull for debugging when I don't want to drag everything into paraview
  • some refinement flag magic, I hope this doesn't get out of hand yet on why I use which flag where

- neighboring code was changed, first of all neighboring indices were reassigned to 1-56, for 1-24=sides, 25-48=corners and 49-56=edges.
- hvy_neighbor now has 3*56 length as for cvs different definition can coexist
- whole code for calc send bounds and calc recv bounds has been rewritten, it is now formulas and not list, drastically reducing code clutter
- all patch definition moved to treelib library in neighborhood.f90
- then  module_mpi and module_wavelt just include those and call them
- neighborhood definition now consistent between 2D and 3D
- more documentation will follow, but for now there is a small function write_neighborhood_info that can help
- unitTests have moved into its own unitTest module which already existed but now has more tests
- some basic tests got more command line functionality and a new sync test was added to check if patches are synced correctly, this test is not added to any testing framework though yet
- find neighbors was rewritten to accomodate new neighbor relations
- IO has option to save ghosts now as additional parameter, gonna add full hdf2vtk support for that soon
- CE now modifies separately inside the domain and on the ghost patches, the last is actually just used for overwriting values for coarser neighbors
- dump block was updated, it now also exists in fancy - this helps me with debugging what is going on if I don't want to use paraview and just check block results
- adaptive tests save and compare refined values only for unlifted wavelets to save space
- Ghost patches and interior patches are now always defined from the block that is requesting the indices, inverting takes place once we go from sender-perspective to receiver-perspective
- meaning, that ijk_patches with same indices for sender and receiver are not the correspondant parts but we need to invert the relation beforehand
- code was tested and should be consistent with what is written in output
- some aesthetical output changes
- because it was made not necessary and in order to avoid confusion I've removed it
- to accomodate, family relations now stretch over 16 indices for all levels
- I also did some more aesthetics and added sync test to ghost sync testing block for simulation runs
- and some very little further preparation for CVS
- now there is a string which defines what we want to do
- i hope thats more clear now
- made all available sync cases compatible with overfull CVS grid
- special cases for RHS which is leaf-only is implemented
- all other cases always assume that the full grid has valid values
- buffer Zone was changed to Nwc 2, this changed some results for lifted cases. To be exact it changed blob_adaptive_CDF42, *_CDF44, _*CDF62 and acm_CDF44
- I updated only the values that changed
- also reduced the dataload of acm by reducing the saving frequency
- it works (for now)
- some differences in order 1e-11 are seen in blob tests though, need to investigate that
- also something in init function crashes for the adapt_tree loop which I am still investigating
- included more case for overfull grid in order for coarse extension to be applied correctly
- loop now correctly creates mothers
- I did  a rudimentary check for block size criteria and yes, CDF42 works with BS=12 and CDF44 with BS=16
- There is still an induced difference between CVS and non-CVS loop results which I need to discuss with Thomas
- checked performance with simple tests and for 3D it takes around 15% longer. This possibly worsens with more blocks but it's not too bad
- I finished the CVS adapt tree loop, it still needs to be polished and checked more, but it looks good for now
- all parts of the CVS loop got neat functions
- azzalinis iterative loop is implemented to estimate the thresholding for CVS or image denoising
- image denoising got a post function, testing invertibility as well
- invertibility also got it's own unit test
- added two new tests, test results should look nice now
- more smaller changes
- loop now first creates whole grid and then fills it
- this means there are more refinement flags yejjjj (I am sorry, this is getting out of hand)
- I think this should drastically reduce the overhead of sync_lgt_data and update_metaData as their usage was reduced
- for adapt_tree_CVS I fixed a bug where interior blocks tried to sync with finer neighbors and this should not be allowed as we would need to respect this with CE, but the less CE the better (just at leaf-layer were we can't avoid it)
- a few more family functions
- some functions were adapted to handle new refinement flag that basically tells us a block exists but has trash in it
- this sounds confusing but basically we decompose all leaf blocks first as this is the most expensive thing to do and then go lvl-wise, update the mother blocks and decompose them until all are there
- with this we do not need to do any lgt_data sync withing the decomposition or reconstruction loop and this is very nice for paralellization
- added a prototype for refinement for only significant blocks but this is very experimental
- old adapt_tree loop is made obsolete and exchanged with new full tree version
- why? There was a bug in there, not a too bad one but then we made the design choice to just stick with the full wavelet transformation as I will focus on this anyways and it is more clearer
- this was made even more clean, we now have 3 grid updates only and no lgt_data sync in decomposition or reconstruction loops
- I polished the functions a bit, did some renaming and moving around. Coarsening_indicator_tree looks cleaner now
- some flags are still to be moved but the refinement flag hell was dampened a bit
- increased the block limit checker in the main loop to accommodate for the increased amount of blocks in the full wavelet transformation (factor 8/7 or 4/3)
- updating test results for new full wavelet transformation adaption
- update test initial conditions to be more complex, this takes a bit more time but catches more things which are going on
- updated blockmin to what i've found from unit tests
- image denoising test in post processing
- got rid of all weird temporary refinement flags that I have introduced
- I love how clean it is starting to look
- ney insects tests for ry-run in order to verify the creation and movement of objects
- I revisited the block estimate and adapted it to recent changes. Now we can squeeze more blocks in. Since we are seldom memory-bound this is probably only nice for development testing
- seems like I forgot some test results so in this commit they are all included
@@ -160,9 160,9 @@ subroutine setInitialCondition_tree(params, hvy_block, tree_ID, adapt, time, ite
! NOTE: the grid adaptation can take the mask function into account (such that the fluid/solid
! interface is on the finest level).
if (present(hvy_mask) .and. params%threshold_mask) then
call adapt_tree( time, params, hvy_block, tree_ID, params%coarsening_indicator_inicond, hvy_tmp, hvy_mask=hvy_mask )
call adapt_tree( time, params, hvy_block, tree_ID, params%coarsening_indicator, hvy_tmp, hvy_mask=hvy_mask)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why has this been changed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I accidently overwrote it by copying.
It is only applied when NOT reading in from file in master currently. Is this correct or should it be applied for both?

call refine_tree( params, hvy_block, hvy_tmp, "everywhere", tree_ID=tree_ID_flow )
! refine the mesh, normally "everywhere"
if (params%threshold_mask) then
call refine_tree( params, hvy_block, hvy_tmp, "everywhere", tree_ID=tree_ID_flow, hvy_mask=hvy_mask )
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for your experimental "refine_selective" indicator, I assume ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, selective indicator needs mask for coarsening indicator so it is passed on here.

end function
!> Convert a neighborhood to a different lvl_diff and give back the lower bound of possible indices.
!> This is needed as neighborhood relations might have multiple counterparts (up to 4)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get what this function does. can you add a few more words?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any neighborhood patch can have a specific level-difference. However for different level-differences those patches are occupied differently. For example, a left (-x) neighbor with possible indices 1-4 looks the following:

lvl_diff=0 (same level) : 1 is occupied, for lvl_diff=0 it is always the first available number of a patch
lvl_diff= 1 (coarser neighbor) : one of the entries from 1-4 is occupied. It depends on which part of the face of the larger block faces us as a neighbor (-y-z, y-z, -y z, y z) and is assigned the number accordingly
lvl_diff=-1 (finer neighbor) : all entries 1-4 are occupied by different blocks. Their number tells us which quarter of the face they lay (again -y-z, y-z, -y z, y z)

When I now want to know "Do I have a same level neighbor for the same patch" or "Do I have a coarser neighbor for the same patch" the easiest would be to add or subtract multiples of 56, but I might miss neighbors that have not exactly this difference (from 1 to 56 2 for example).
I therefore simply compute the bounds of possible indices (1 and 4) and check if ANY of those are taken up with the given level-difference. In this case I know that there is a neighbor if one of them is occupied. For Faces we have 4 possible indices, for edges 2 and for corners 1.

- obviously Jmax of a simulation can still not be smaller than JMaxActive of the file because I am not a magician who wants to make blocks disappear
- the rest works fine - for both treecode representation in files
! values from MPI - reference at module_mpi%init_ghost_nodes
mem_per_block = mem_per_block &
2.0 * real(N_MAX_COMPONENTS) * real(product(Bs(1:dim) 2*g) -product(Bs(1:dim))) & ! real buffer ghosts, 1send 1recv
3.0 * real(max_neighbors_used) * 5.0 / 2.0 & ! int buffer (4byte hence /2), sending 6 values each, 3 buffers exist
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

justa detail but if its 6 values each, then 5.0 --> 6.0, right ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at it and made sure it coincides with what is in module_mpi:

  • One buffer has 7 values each
  • Other has 4 or 5 (DEV)

max_family = 9
max_neighbors = 56*2 32 ! 2D has only 32 possibilites but 56 from 3D is used
max_neighbors_used = 56 24 24 ! lvl J 1,J,J-1 for full tree grid
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the maximum number of possibly active neighbors? i.e. a block can have at most 56 24 24 neighbors on the three levels ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

  • Level J 1 can have all possible relations full (being 56 in 3D and 12 in 2D)
  • Level J can have 24 relations occupied (8 in 2D), which is 1 for each patch
  • Level J-1 has 24 max relations? - in practice I think even drastically less? I guess it could be 3 for 2D as illustrated below, for 3D I don't know (6 maybe?) but I just cap it together at 24 (and 8 for 2D)
1122
1122
33BX
33XX

B: block where we look at neighbors
1-3: Neighbors on level J-1
X: sisters of B on level J

logical, intent(in), optional :: no_deletion
character(len=*) :: sync_case !< String representing which kind of syncing we want to do
!> Additional value to be considered for syncing logic, can be level or refinement status from which should be synced, dependend on sync case
integer(kind=ik), intent(in), optional :: s_val
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me like this is not actually optional, or otherwise a IF PRESENT(s_val) should appear. Maybe just omit the OPTIONAL?

Also, please add a few lines on what sync_case does.
sync_case='level' -> only sync blocks on the given level "s_val"
sync_case='ref' -> only sync blocks with identical refinement status "s_val"

is that correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, it is not optional.
At first I thought to include sync case "full" like for ghost point synching but eventually I decided it makes no sense for family synch on full tree grid.

@tommy-engels tommy-engels merged commit 28cb5dc into adaptive-cfd:master Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants