Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KubernetesPodOperator truncates logs #10586

Closed
michalmisiewicz opened this issue Aug 26, 2020 · 11 comments
Closed

KubernetesPodOperator truncates logs #10586

michalmisiewicz opened this issue Aug 26, 2020 · 11 comments
Assignees
Labels
kind:bug This is a clearly a bug provider:cncf-kubernetes Kubernetes provider related issues

Comments

@michalmisiewicz
Copy link
Contributor

michalmisiewicz commented Aug 26, 2020

Apache Airflow version: 1.10.10

Kubernetes version (if you are using kubernetes) (use kubectl version): v1.15.11

KubernetesPodOperator truncates logs when container produces more than 10 lines of logs before execution of read_pod_logs function. Is there any reason to make 10 as default value for tail_lines argument ?

    def read_pod_logs(self, pod: V1Pod, tail_lines: int = 10):
        """Reads log from the POD"""
        try:
            return self._client.read_namespaced_pod_log(
                name=pod.metadata.name,
                namespace=pod.metadata.namespace,
                container='base',
                follow=True,
                tail_lines=tail_lines,
                _preload_content=False
            )
        except BaseHTTPError as e:
            raise AirflowException(
                'There was an error reading the kubernetes API: {}'.format(e)
            )
@michalmisiewicz michalmisiewicz added the kind:bug This is a clearly a bug label Aug 26, 2020
@michalmisiewicz
Copy link
Contributor Author

Looking on previous K8S related PRs I would like to ask @kaxil for help 🙂

@kaxil
Copy link
Member

kaxil commented Aug 26, 2020

tagging in @dimberman if you have any idea about this

@SamWheating
Copy link
Contributor

SamWheating commented Aug 31, 2020

👋

I've been running into this issue and we've created a modified copy of k8spodoperator as a workaround. I'd be interested in making a PR to expose the tail_lines property to the user in the k8sPodOperator since this would allow us to get back onto the official version of the operator in future releases.

If its alright with you, you can assign this issue to me and I can get a PR up hopefully sometime this week?

@dimberman
Copy link
Contributor

@SamWheating assigned :). Could you please make sure that this issue is persistent in 1.10.12?

@SamWheating
Copy link
Contributor

Yup, I'm just updating our clusters to 1.10.12 and then I'll try to replicate the error case (a pod producing a lot of logs quickly and then shutting down).

@SamWheating
Copy link
Contributor

Replicated the issue in Airflow 1.10.12 like so:

Created a DAG to really quickly log a bunch of stuff in a k8s pod:

from datetime import timedelta

from airflow import models
from airflow import utils
from airflow.contrib.operators.kubernetes_pod_operator import KubernetesPodOperator

dag = models.DAG(
    'data-infrastructure-examples.sam-k8s-fast-logging',
    start_date=utils.dates.days_ago(1),
    max_active_runs=1,
    dagrun_timeout=timedelta(minutes=1),
    schedule_interval=None
)

hello_world_pod = KubernetesPodOperator(namespace='data-infrastructure-examples',
                                        image="busybox",
                                        name="count-to-10000",
                                        in_cluster=True,
                                        task_id="count-to-10000",
                                        get_logs=True,
                                        dag=dag,
                                        is_delete_operator_pod=False,
                                        cmds=['seq', '10000']
                                        )

Running the DAG then only provides the last 10 lines of the log in the airflow UI:

[2020-08-31 18:08:08,673] {taskinstance.py:901} INFO - Executing <Task(KubernetesPodOperator): count-to-10000> on 2020-08-31T18:08:01.278616+00:00
[2020-08-31 18:08:08,678] {standard_task_runner.py:54} INFO - Started process 34196 to run task
[2020-08-31 18:08:08,848] {standard_task_runner.py:77} INFO - Running: ['airflow', 'run', 'data-infrastructure-examples.sam-k8s-fast-logging', 'count-to-10000', '2020-08-31T18:08:01.278616+00:00', '--job_id', '57658', '--pool', 'default_pool', '--raw', '-sd', 'DAGS_FOLDER/data_infrastructure_examples/kubernetes_fast_logs.py', '--cfg_path', '/tmp/tmp1jvekhel']
[2020-08-31 18:08:08,849] {standard_task_runner.py:78} INFO - Job 57658: Subtask count-to-10000
[2020-08-31 18:08:08,955] {logging_mixin.py:112} INFO - Running %s on host %s <TaskInstance: data-infrastructure-examples.sam-k8s-fast-logging.count-to-10000 2020-08-31T18:08:01.278616+00:00 [running]> airflow-worker-7dcb77d8dc-bfxnw
[2020-08-31 18:08:09,027] {logging_mixin.py:112} WARNING - /usr/local/lib/python3.7/site-packages/airflow/kubernetes/pod_launcher.py:330: DeprecationWarning: Using `airflow.contrib.kubernetes.pod.Pod` is deprecated. Please use `k8s.V1Pod`.
  security_context=_extract_security_context(pod.spec.security_context)
[2020-08-31 18:08:09,027] {logging_mixin.py:112} WARNING - /usr/local/lib/python3.7/site-packages/airflow/kubernetes/pod_launcher.py:77: DeprecationWarning: Using `airflow.contrib.kubernetes.pod.Pod` is deprecated. Please use `k8s.V1Pod` instead.
  pod = self._mutate_pod_backcompat(pod)
[2020-08-31 18:08:09,699] {pod_launcher.py:173} INFO - Event: count-to-10000-f146290239a24243b48f8dbf0a26c8e5 had an event of type Pending
[2020-08-31 18:08:09,699] {pod_launcher.py:139} WARNING - Pod not yet started: count-to-10000-f146290239a24243b48f8dbf0a26c8e5
[2020-08-31 18:08:10,709] {pod_launcher.py:173} INFO - Event: count-to-10000-f146290239a24243b48f8dbf0a26c8e5 had an event of type Pending
[2020-08-31 18:08:10,709] {pod_launcher.py:139} WARNING - Pod not yet started: count-to-10000-f146290239a24243b48f8dbf0a26c8e5
[2020-08-31 18:08:11,716] {pod_launcher.py:173} INFO - Event: count-to-10000-f146290239a24243b48f8dbf0a26c8e5 had an event of type Pending
[2020-08-31 18:08:11,716] {pod_launcher.py:139} WARNING - Pod not yet started: count-to-10000-f146290239a24243b48f8dbf0a26c8e5
[2020-08-31 18:08:12,725] {pod_launcher.py:173} INFO - Event: count-to-10000-f146290239a24243b48f8dbf0a26c8e5 had an event of type Succeeded
[2020-08-31 18:08:12,726] {pod_launcher.py:287} INFO - Event with job id count-to-10000-f146290239a24243b48f8dbf0a26c8e5 Succeeded
[2020-08-31 18:08:12,751] {pod_launcher.py:156} INFO - b'9991\n'
[2020-08-31 18:08:12,751] {pod_launcher.py:156} INFO - b'9992\n'
[2020-08-31 18:08:12,751] {pod_launcher.py:156} INFO - b'9993\n'
[2020-08-31 18:08:12,751] {pod_launcher.py:156} INFO - b'9994\n'
[2020-08-31 18:08:12,752] {pod_launcher.py:156} INFO - b'9995\n'
[2020-08-31 18:08:12,752] {pod_launcher.py:156} INFO - b'9996\n'
[2020-08-31 18:08:12,752] {pod_launcher.py:156} INFO - b'9997\n'
[2020-08-31 18:08:12,752] {pod_launcher.py:156} INFO - b'9998\n'
[2020-08-31 18:08:12,752] {pod_launcher.py:156} INFO - b'9999\n'
[2020-08-31 18:08:12,752] {pod_launcher.py:156} INFO - b'10000\n'
[2020-08-31 18:08:12,766] {pod_launcher.py:173} INFO - Event: count-to-10000-f146290239a24243b48f8dbf0a26c8e5 had an event of type Succeeded
[2020-08-31 18:08:12,766] {pod_launcher.py:287} INFO - Event with job id count-to-10000-f146290239a24243b48f8dbf0a26c8e5 Succeeded
[2020-08-31 18:08:12,772] {pod_launcher.py:173} INFO - Event: count-to-10000-f146290239a24243b48f8dbf0a26c8e5 had an event of type Succeeded
[2020-08-31 18:08:12,772] {pod_launcher.py:287} INFO - Event with job id count-to-10000-f146290239a24243b48f8dbf0a26c8e5 Succeeded
[2020-08-31 18:08:12,793] {taskinstance.py:1070} INFO - Marking task as SUCCESS.dag_id=data-infrastructure-examples.sam-k8s-fast-logging, task_id=count-to-10000, execution_date=20200831T180801, start_date=20200831T180808, end_date=20200831T180812
[2020-08-31 18:08:13,568] {local_task_job.py:102} INFO - Task exited with return code 0

While checking the logs on the completed pod show the entire count:

> kubectl logs -f -n data-infrastructure-examples count-to-10000-f146290239a24243b48f8dbf0a26c8e5
1
2
3
4
5
6
(all the other numbers in the sequence)
9999
10000

@michalmisiewicz
Copy link
Contributor Author

I'd be interested in making a PR to expose the tail_lines property to the user in the k8sPodOperator since this would allow us to get back onto the official version of the operator in future releases.

The fix should be applied also for KubernetesExecutor 🙂
I'm now testing stability of KubernetesExecutor with tail_lines equal to 0.

@SamWheating
Copy link
Contributor

Sounds good. From the kubernetes python client source:

:param int tail_lines: If set, the number of lines from the end of the logs to show. If not specified, logs are shown from the creation of the container or sinceSeconds or sinceTime

Is there a reason users would ever want truncate lines? Maybe its just easier to not set this value and retrieve all logs in all cases.

@dimberman
Copy link
Contributor

@SamWheating I'd say retrieve all logs in all cases. The fact that these are truncated is a bug. Glad to help whenever you have a PR ready!

@dobromirmontauk
Copy link

Would love to see a fix to this!

@michalmisiewicz
Copy link
Contributor Author

Fixed in #11325

@kaxil kaxil added provider:cncf-kubernetes Kubernetes provider related issues and removed area:k8s labels Nov 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug This is a clearly a bug provider:cncf-kubernetes Kubernetes provider related issues
Projects
None yet
Development

No branches or pull requests

6 participants
  翻译: