Record logger name as the instrumentation scope name #3810

tm0nk · 2024-03-24T00:24:36Z

Description

Fixes issue #2485 Record logger name as the instrumentation scope name

Approach: Cache one Logger object per Python logger name in LoggingHandler. The @lru_cache annotation on get_logger requires Python 3.2 or later.

The Open Telemetry Spec specifies that the Logger Name SHOULD be recorded as the Instrumentation Scope name. Reference: open-telemetry/opentelemetry-specification#2359

This has already been implemented in Open Telemetry Java, but not in Open Telemetry Python.

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

I've modified many of the tests in test_export.py to include an assert that tests that the logger name and instrumentation scope name are the same.

Will be happy to modify others to do the same validation.

Does This PR Require a Contrib Repo Change?

No, this brings Open Telemetry Python in line with the existing Open Telemetry spec.

Answer the following question based on these examples of changes that would require a Contrib Repo Change:

The OTel specification has changed which prompted this PR to update the method interfaces of opentelemetry-api/ or opentelemetry-sdk/
The method interfaces of test/util have changed
Scripts in scripts/ that were copied over to the Contrib repo have changed
Configuration files that were copied over to the Contrib repo have changed (when consistency between repositories is applicable) such as in
- pyproject.toml
- isort.cfg
- .flake8
When a new .github/CODEOWNER is added
Major changes to project information, such as in:
- README.md
- CONTRIBUTING.md
Yes. - Link to PR:
No.

Checklist:

Followed the style guidelines of this project
Changelogs have been updated
Unit tests have been added
Documentation has been updated

linux-foundation-easycla · 2024-03-24T00:24:42Z

The committers listed above are authorized under a signed CLA.

✅ login: lzchen / name: Leighton Chen (34b9f1b)
✅ login: tm0nk / name: Tom Monk (133bc7c, 0576859)

srikanthccv · 2024-03-25T10:19:53Z

Overall LGTM, Please address the failed checks.

Cache one Logger object per Python logger name in LoggingHandler

tm0nk · 2024-03-25T15:13:19Z

Thanks @srikanthccv I've addressed the failed lint check. I'm working on the missing CLA authorization. I'm contributing as a Snowflake employee, so the CLA authorization needs more than just my approval...

tm0nk · 2024-03-26T15:04:56Z

There is still one failing lint check: method-cache-max-size-none Let me do some research to see about how best to size the cache. Since something similar is already implemented for OpenTelemetry Java, I may draw inspiration from there.

jeremydvoss · 2024-04-11T16:41:24Z

opentelemetry-sdk/src/opentelemetry/sdk/_logs/_internal/__init__.py

@@ -448,9  449,6 @@ def __init__(
    ) -> None:
        super().__init__(level=level)
        self._logger_provider = logger_provider or get_logger_provider()
-        self._logger = get_logger(
-            __name__, logger_provider=self._logger_provider
-        )


This seems like a very significant change. But I am also confused about why the logging handler was storing a logger for it's own source namespace and not even the namespace of the logger it's added to. Can anyone explain the purpose of this?

To be more clear. This seems like a good change. I'm just surprised by how strange the existing code's functionality is. So, I wanted to make sure there was not a good reason for it to be that way.

It seems a single logger instance stored in _logger was chosen based on perceived need for performance, based on the discussion in the original issue #2485

Based on the discussion yesterday, I'm planning to use pytest-benchmark to test if there is a performance regression with creating one logger instance per logger name.

@tm0nk

Are you planning on including these tests as part of this pr?

jeremydvoss · 2024-04-11T16:43:24Z

opentelemetry-sdk/src/opentelemetry/sdk/_logs/_internal/__init__.py


    def flush(self) -> None:
        """
        Flushes the logging output. Skip flushing if logger is NoOp.
        """
-        if not isinstance(self._logger, NoOpLogger):
-            self._logger_provider.force_flush()
+        self._logger_provider.force_flush()


Why remove the if statement here?

There is no longer a single self._logger instance to check against. Perhaps we need to keep track of all logger instances (one for each logger name encountered) and do a force_flush() if even a single one is not a NoOpLogger.

Got it. It doesn't seem like there would be an issue with flushing a no op either so I think this works fine. Resolved.

Is there an issue here with typing, because the LoggerProvider is of the API variety, as it comes from the API's get_logger_provider() function, and it doesn't specify a force_flush() method.

jeremydvoss · 2024-04-11T16:45:48Z

opentelemetry-sdk/src/opentelemetry/sdk/_logs/_internal/__init__.py

@@ -618,6  617,7 @@ def __init__(
    def resource(self):
        return self._resource

+    @lru_cache(maxsize=None)


Is the point of this only to speed up the multiple calls to get logger? Or is there a functionality benefit. Given that the logs will be coming from multiple loggers, I am not sure this cache is necessary.

Yes, this is only to speed up the multiple calls to get_logger. We can be more explicit in keeping track of one logger instance per logger name encountered. @lru_cache may not be the right approach here if we want to be able to retrieve all logger instances after they've been placed in the cache.

"@lru_cache may not be the right approach here if we want to be able to retrieve all logger instances after they've been placed in the cache"
Not sure I understood this. Could you explain further?

Please add cache tests and the pytest benchmarking you mentioned. Otherwise, looks great to me!

lzchen · 2024-05-02T17:13:44Z

@tm0nk

This is a great contribution. Please add some benchmark tests to see the implications of multiple calls to get_logger and fix the build when you can. Also as discussed in the Python SIG, using a cache is probably necessary to save us from multiple creations of objects when calling get_logger multiple times.

pmcollins

Thank you for the change. I've left a couple of questions.

pmcollins · 2024-05-06T15:31:05Z

opentelemetry-sdk/src/opentelemetry/sdk/_logs/_internal/__init__.py

-            self._logger.emit(self._translate(record))
+        logger = get_logger(record.name, logger_provider=self._logger_provider)
+        if not isinstance(logger, NoOpLogger):
+            logger.emit(self._translate(record))

    def flush(self) -> None:
        """
        Flushes the logging output. Skip flushing if logger is NoOp.


Should we update this comment?

pmcollins · 2024-05-06T15:46:33Z

opentelemetry-sdk/src/opentelemetry/sdk/_logs/_internal/__init__.py


    def flush(self) -> None:
        """
        Flushes the logging output. Skip flushing if logger is NoOp.
        """
-        if not isinstance(self._logger, NoOpLogger):
-            self._logger_provider.force_flush()
+        self._logger_provider.force_flush()


Is there an issue here with typing, because the LoggerProvider is of the API variety, as it comes from the API's get_logger_provider() function, and it doesn't specify a force_flush() method.

lzchen · 2024-06-26T18:04:57Z

@tm0nk

Gentle ping on this issue. Are you still working on this? We will be reassigning this issue if we don't hear back from you.

emdneto · 2024-06-26T21:56:00Z

opentelemetry-sdk/src/opentelemetry/sdk/_logs/_internal/__init__.py

@@ -622,6  621,7 @@ def __init__(
    def resource(self):
        return self._resource

+    @lru_cache(maxsize=None)


Since we are not defining maxsize this is the same as just @cache right?

lzchen · 2024-07-05T18:40:33Z

@tm0nk

Gentle ping on this. Are you still planning to work on this?

tm0nk · 2024-10-01T22:13:47Z

@sfc-gh-jopel has opened a PR with these changes:
#4208

He will be following up on getting this fix merged. Thanks @sfc-gh-jopel!

tm0nk requested a review from a team March 24, 2024 00:24

srikanthccv changed the title ~~Fix issue 2485 enable caching for get_logger calls~~ Record logger name as the instrumentation scope name Mar 25, 2024

Fix issue 2485 enable caching for get_logger calls

133bc7c

Cache one Logger object per Python logger name in LoggingHandler

tm0nk force-pushed the fix-issue-2485-instrumentation-scope-name branch from f3dc1d5 to 133bc7c Compare March 25, 2024 15:09

Add entry to CHANGELOG.md

0576859

tm0nk force-pushed the fix-issue-2485-instrumentation-scope-name branch from 70f7724 to 0576859 Compare March 25, 2024 15:40

jeremydvoss mentioned this pull request Apr 11, 2024

Stabilize Logs #3361

Open

4 tasks

jeremydvoss reviewed Apr 11, 2024

View reviewed changes

Merge branch 'main' into fix-issue-2485-instrumentation-scope-name

34b9f1b

pmcollins reviewed May 6, 2024

View reviewed changes

emdneto reviewed Jun 26, 2024

View reviewed changes

sfc-gh-jopel mentioned this pull request Oct 1, 2024

Record logger name as the instrumentation scope name #4208

Merged

7 tasks

tm0nk closed this Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record logger name as the instrumentation scope name #3810

Record logger name as the instrumentation scope name #3810

tm0nk commented Mar 24, 2024 •

edited

Loading

linux-foundation-easycla bot commented Mar 24, 2024 •

edited

Loading

srikanthccv commented Mar 25, 2024

tm0nk commented Mar 25, 2024

tm0nk commented Mar 26, 2024

jeremydvoss Apr 11, 2024

jeremydvoss Apr 11, 2024

tm0nk Apr 12, 2024

lzchen Apr 23, 2024

jeremydvoss Apr 11, 2024

tm0nk Apr 12, 2024

jeremydvoss Apr 29, 2024

pmcollins May 6, 2024

jeremydvoss Apr 11, 2024

tm0nk Apr 12, 2024

jeremydvoss Apr 29, 2024

lzchen commented May 2, 2024

pmcollins left a comment

pmcollins May 6, 2024

pmcollins May 6, 2024

lzchen commented Jun 26, 2024

emdneto Jun 26, 2024

lzchen commented Jul 5, 2024

tm0nk commented Oct 1, 2024

Record logger name as the instrumentation scope name #3810

Record logger name as the instrumentation scope name #3810

Conversation

tm0nk commented Mar 24, 2024 • edited Loading

Description

Type of change

How Has This Been Tested?

Does This PR Require a Contrib Repo Change?

Checklist:

linux-foundation-easycla bot commented Mar 24, 2024 • edited Loading

srikanthccv commented Mar 25, 2024

tm0nk commented Mar 25, 2024

tm0nk commented Mar 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lzchen commented May 2, 2024

pmcollins left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lzchen commented Jun 26, 2024

Choose a reason for hiding this comment

lzchen commented Jul 5, 2024

tm0nk commented Oct 1, 2024

tm0nk commented Mar 24, 2024 •

edited

Loading

linux-foundation-easycla bot commented Mar 24, 2024 •

edited

Loading