Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Project] Collective Export and Import #7475

Open
5 of 9 tasks
znarf opened this issue Jul 18, 2024 · 5 comments
Open
5 of 9 tasks

[Project] Collective Export and Import #7475

znarf opened this issue Jul 18, 2024 · 5 comments
Assignees
Labels
Milestone

Comments

@znarf
Copy link
Member

znarf commented Jul 18, 2024

To push forward decentralization, user independence and GDPR compliance, we want to allow users to export data from their collective and import it in a different instance of our open source software.

Engineering plan for this sprint

  1. 3 of 3
    api enhancement security
    Betree
  2. 0 of 3
    api enhancement
    hdiniz
  3. api test
    kewitz

Extra

  1. api enhancement
    kewitz
  2. api enhancement performance
    hdiniz
  3. api performance
    Betree

Follow-ups

  1. api project technical-debt

Edit by @Betree: updated issue with the plan.

@znarf znarf added the project label Jul 18, 2024
@znarf znarf added this to the Y24C5 milestone Jul 18, 2024
@Betree
Copy link
Member

Betree commented Jul 22, 2024

The engineering team has decided on a plan for this project. Some assumptions first:

  • Passwords will never be exported. Users will have to re-confirm their email on the new platform.
  • The first solution will take the shape of simple import/export scripts meant to be used by engineers only
  • We want to use standard, human-readable, GDPR-compliant formats (JSON, JSONL)
  • We won't assume users will be starting with a fresh instance (avoid collisions of IDs)
  • In the future, we may want to be aware of other instances (i.e. need to identify a transaction in a network of instances)

Based on that, 3 approaches were discussed:

  1. GraphQL approach
    • Format: defined by the queries
    • Pros:
      • Re-use permission logic to filter out data (e.g. a payment method)
      • Simple export implementation
      • Format is consistent with our public API
    • Cons:
      • Business cost is moved to import (need to map entities from GraphQL to DB)
      • GraphQL will need adaptations to work properly (e.g. surface data, hack limits)
      • Not everything is exposed today on GraphQL, and we may not want to expose everything (histories, legal docs, paypal plans, paypal products, ...etc). Yet the best guarantee that the platform will continue to work is to have everything.
      • Limited control over performances
      • Will eventually lead to enormous exports (lot of duplication)
      • The queries to get everything would actually be quite complex to write
  2. Query the data directly
    • Format: JSONL, one file per table, 1 row per entry
    • Risk: Not implementing permissions correctly could lead to leaked data
    • Pros:
      • Simplified import
      • Guarantee that all data is there
      • We can build upon @kewitz's existing work, which already covers the base MVP for this project
  3. Use pg_dump/pg_restore
    • Risks: Not filtering enough => exporting too much (security)
    • Pros:
      • Simple export & import
    • Cons:
      • Non-standard format
      • Auto-increment ID conflict issues

Based on engineering discussions, we have decided to build upon @kewitz's existing work which was already implemented based on the 2nd approach. However, we reckon that the comments made on security risks are valid concerns and special attention will be given to make sure we don't export more than we should. Whenever possible, we'll try to re-use and put in common the permission logic used by GraphQL.

@Betree
Copy link
Member

Betree commented Aug 5, 2024

Last week

  • @hdiniz did zip file implementation

This week

@Betree
Copy link
Member

Betree commented Aug 12, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: 💚 On track
Development

No branches or pull requests

4 participants