Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RF03 fixes can cause incorrect alias assumptions in certain cases #5983

Open
3 tasks done
TheCleric opened this issue Jun 24, 2024 · 4 comments
Open
3 tasks done

RF03 fixes can cause incorrect alias assumptions in certain cases #5983

TheCleric opened this issue Jun 24, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@TheCleric
Copy link
Contributor

TheCleric commented Jun 24, 2024

Search before asking

  • I searched the issues and found no similar issues.

What Happened

When fixing some of our sql code we began seeing join like where clauses in subqueries suddenly "joining" a table to itself via fixed aliases. Here is a minimal reproducible example:

SELECT
    *,
    (
      SELECT C.JobId FROM customer AS C WHERE C.Id = CustomerId LIMIT 1
    ) AS CustomerId
FROM (
SELECT
    F.*,
    B.*
FROM
    foo AS F
    LEFT JOIN bar AS B ON P.Id = B.ParentId
);

I know that this query could be cleaned up and made more readable, but it is a very stripped down version of our real query which is much less easily changed.

Expected Behaviour

The code above should be left relatively (or completely) unchanged.

Observed Behaviour

SQLFluff applies the following fix that renders the query valid SQL, but without the intended results. Note that the previously unqualified column CustomerId which was referencing the outer table, has been given a qualification of the inner table C which renders it invalid.

SELECT
    *,
    (
      SELECT C.JobId FROM customer AS C WHERE C.Id = C.CustomerId LIMIT 1
    ) AS CustomerId
FROM (
SELECT
    F.*,
    B.*
FROM
    foo AS F
    LEFT JOIN bar AS B ON P.Id = B.ParentId
);

How to reproduce

Minimal reproducible example provided in the 'What Happened` section.

Any SQLfluff config with RF03 enabled should trigger it.

Dialect

ANSI (though discovered in Databricks).

Version

sqlfluff, version 3.0.7
Python 3.10.12

Configuration

Default configuration

Are you willing to work on and submit a PR to address the issue?

  • Yes I am willing to submit a PR!

Code of Conduct

@TheCleric TheCleric added the bug Something isn't working label Jun 24, 2024
@TheCleric
Copy link
Contributor Author

I think in this particular instance this may be occurring because the table in the subquery has an alias, but the parent's from clause is itself a subquery with no alias.

So according to

if query.parent:
possible_ref_tables = list(
self._iter_available_targets(query.parent)
)
it will try to get the parent aliases to see if there are multiple aliases that this column could be resolved to, but due to
if alias.ref_str:
yield alias.ref_str
we would skip empty aliases.

Therefore it counts the subquery's aliases (1) and the parent query's non-empty aliases (0) and assumes the C alias is the only valid one. So it feels like (in at least this instance) we'd want to count the non-empty aliases in determining whether we can assume the C alias is the only one for the purpose of fixing.

@TheCleric
Copy link
Contributor Author

I've got some code that MOSTLY works. Here are the relevant changes from src/sqlfluff/rules/references/RF03.py:

    def _iter_available_targets(
        self,
        query: Query,
        include_empty: bool = False,
    ) -> Iterator[str]:
        """Iterate along a list of valid alias targets."""
        for selectable in query.selectables:
            select_info = selectable.select_info
            for alias in select_info.table_aliases:
                if alias.ref_str or include_empty:
                    yield alias.ref_str

    def _visit_queries(self, query: Query, visited: set) -> Iterator[LintResult]:
        select_info: Optional[SelectStatementColumnsAndTables] = None
        if query.selectables:
            select_info = query.selectables[0].select_info
            # How many table names are visible from here? If more than one then do
            # nothing.
            if select_info and len(select_info.table_aliases) == 1:
                fixable = True
                # :TRICKY: Subqueries in the column list of a SELECT can see tables
                # in the FROM list of the containing query. Thus, count tables at
                # the *parent* query level.
                possible_ref_tables = list(self._iter_available_targets(query))
                if query.parent:
                    possible_ref_tables  = list(
                        # Due to https://github.com/sqlfluff/sqlfluff/issues/5983
                        # include empty aliases for the purposes of our count
                        # to see if this is auto-fixable, but make sure we
                        # don't count the subquery itself
                        self._iter_available_targets(
                            query.parent,
                            include_empty=True,
                        )
                    )

This seems to fix the issue but, this causes another test to begin failing, namely test_fail_single_table_mixed_qualification_of_references_subquery in RF03.yml which no longer gets its alias disambiguated. I think my solution needs an extra check to make sure that the empty ref we're counting isn't the FROM statement containing the query, but I don't think I know enough about the SQLFluff internals to do so intelligently.

@fmms
Copy link
Contributor

fmms commented Jun 24, 2024

@TheCleric did you see my #5980 ? There as well a qualified reference is used by accident as the the subquery has an alias.

@TheCleric
Copy link
Contributor Author

@TheCleric did you see my #5980 ? There as well a qualified reference is used by accident as the the subquery has an alias.

I did not. Could possibly be related.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants