← index #334PR #509
Related · high · value 0.225
QUERY · ISSUE

umqtt (with tls): check_msg() / wait_msg() - how to know connection is gone? timeout?

openby mirkoopened 2019-03-15updated 2024-11-01

I'm using the umqtt module - on conjunction with ussl_mbedtls - within the UNIX port with mbedtls.
When I re-route the traffic originally going to my MQTT broker to /dev/null (e.g. 127.0.0.1), check_msg() as well as wait_msg() continue behaving as before. No different return value, no exception thrown.
From the umqtt.robust implementation I figure that should be the case though.
I also thought this might be due to an overly long timeout, so I called settimeout(X) on the underlying socket, but then wait_msg() throws an exception right after X seconds - no matter what - which I also appears weird to me.

What's the way to go to catch network issues and potentially reconnect automatically when using umqtt.{simple,robust} with TLS?

3 comments
jonnor · 2024-08-25

Hi. Is this still an issue with the latest MicroPython and mqtt.simple/mqtt.robust versions?

mirko · 2024-08-25

It's been more than 5 years, I honestly have no idea :)

prabhu-yu · 2024-11-01

hi @mirko , @jonnor
I have submitted a fix here.
https://github.com/micropython/micropython-lib/pull/890/files

CANDIDATE · PULL REQUEST

umqtt.robust: Fix check_msg blocking after reconnect

closedby ian-llewellynopened 2022-07-18updated 2022-10-04

After reconnect(), MQTTClient.socket is blocking by default. This
commit aims to supplement that behaviour at times when check_msg()
sets the socket to non-blocking. It may fix errors reported in #102 and
#192.

This fix:

  • avoids using an additional instance attribute to record the intended
    state of the socket (Edit: this would involve changes to umqtt.simple which should 'know' nothing about robustness)
  • only adds two additional lines of code to one file in the codebase (Edit: a new approach was taken based on feedback: it's now 8 additional code lines)
  • depends on socket's __str__() method to retrieve the current timeout
    value: <socket state=0 timeout=-1 incoming=0 off=0> - not ideal
  • (Edit: overrides umqtt.simple's check_msg() method in order to sock.setblocking(False) before all calls to wait_msg(), even after reconnect())

Edit: An additional note: This fix calls super().wait_msg() as opposed to self.wait_msg() to avoid nested reconnect() retries which would simply reintroduce the original behaviour.

10 comments
ian-llewellyn · 2022-07-18

I have since read @pfalcon's comment in PR #117:

guarantee that check_msg() return immediately and can be used in main loops doing something else (otherwise, check_msg() may block your main loop indefinitely)

This fix would contradict that aim. However, publish() could be considered similarly and it is allowed to block the main loop indefinitely.

andrewleech · 2022-07-18

Hi, that's interesting that the socket timeout is lost after reconnect, I feel that's perhaps an upstream bug that should be addressed.

In the mean time, re-setting the timeout certainly seems appropriate.

Personally I think adding a class attribute to hold the timeout would be better than parsing the str(), the parsing is less likely to work across multiple micropython versions and fairly expensive in terms of CPU cycles really.
Cashing one more float in a class is quite a small amount of ram - through I applaud your efforts at minimising code size / change.

andrewleech · 2022-07-18

I have since read @pfalcon's comment in PR #117:

guarantee that check_msg() return immediately and can be used in main loops doing something else (otherwise, check_msg() may block your main loop indefinitely)

This fix would contradict that aim.

I can't see how your change here has affected any behaviour for check_msg? You're just adding to wait_msg but even then there's no new delays?

ian-llewellyn · 2022-07-19

Many thanks for the feedback @andrewleech. I've had another look at the code with your comments in mind.

The setblocking() calls are both executed in the umqtt.simple module which knows nothing about "robustness"; storing additional variables in that class, for this purpose seems... improper! Agreed though, the str() parsing approach isn't great.

I looked at a couple of other ways to take care of it and settled on this idea: override check_msg() in umqtt.robust, implement the retry loop in there with self.sock.setblocking(False) as the first call and then return super().wait_msg() in the try: except: (same pattern as the overridden wait_msg method). That eliminates the parsing as per my previous attempt, confines the changes to umqtt.robust, continues the pattern of overriding methods to make them more robust and negates the need to add class variables to 'track' the desired socket state. It's a 7 line override excluding the def line - how do you think that stacks up RAM-usage-wise to the alternative of storing a class variable in umqtt.simple?

My last question is this: Is it best to open a completely new PR for this since the approach is different, or squash this on top of the code you've already seen and force push?

In reply to your second comment:

I can't see how your change here has affected any behaviour for check_msg? You're just adding to wait_msg but even then there's no new delays?

Yes and no. Without my change, check_msg is unintentionally converted to wait_msg after a reconnect - so that's a bug resulting in check_msg not returning to the main loop immediately. My change intends to address that accidental conversion, but it too will not return to the main loop if it cannot reconnect. (Edit: reconnect() is not in a try/except, so it will return control to the main loop). I think that's okay though: As per the note in robust's README.rst:

any realistic application would subclass umqtt.robust.MQTTClient class and override delay() and reconnect() methods to suit particular usage scenario.

andrewleech · 2022-07-21

The setblocking() calls are both executed in the umqtt.simple module which knows nothing about "robustness"; storing additional variables in that class, for this purpose seems... improper! Agreed though, the str() parsing approach isn't great.

I looked at a couple of other ways to take care of it and settled on this idea: override check_msg() in umqtt.robust, implement the retry loop in there with self.sock.setblocking(False) as the first call and then return super().wait_msg() in the try: except: (same pattern as the overridden wait_msg method). That eliminates the parsing as per my previous attempt, confines the changes to umqtt.robust, continues the pattern of overriding methods to make them more robust and negates the need to add class variables to 'track' the desired socket state. It's a 7 line override excluding the def line - how do you think that stacks up RAM-usage-wise to the alternative of storing a class variable in umqtt.simple?

Yes and no. Without my change, check_msg is unintentionally converted to wait_msg after a reconnect - so that's a bug resulting in check_msg not returning to the main loop immediately. My change intends to address that accidental conversion, ~but it too will not return to the main loop if it cannot reconnect~. (Edit: reconnect() is not in a try/except, so it will return control to the main loop). I think that's okay though: As per the note in robust's README.rst:

any realistic application would subclass umqtt.robust.MQTTClient class and override delay() and reconnect() methods to suit particular usage scenario.

Ok, I'm not familiar with the imqtt library myself so lets map out how it's supposed to work.
In simple.py:

  • wait_msg() it relies on previously set blocking behavior to read the first byte, turns on blocking, then reads the rest of the packet if there was a byte or returns None if there wasn't.
  • check_msg() explicitely turns off blocking then calls wait_msg() which will immediately return with None if there's nothing already waiting.

An important point here is that by default, TCP sockets are placed in a blocking mode. So on startup the first call to wait_msg() will already be in blocking mode.

Moving on to robust.py:

  • wait_msg() from simple is wrapped to run in a loop, reconnecting if there was a failure an retrying the underlying wait_msg().
  • The problem you're addressing is that reconnect will re-instate the blocking behaviour and loop forever even when called from check_msg().
  • wait_msg() probably should have a timeout on the loop to not block forever, but that's a separate discussion.

I understand the root problem now :-)
I think this newer method has merit, however still isn't really correct to the intended behaviour of the functions as you suggest - I'll make some review notes in the code.

My last question is this: Is it best to open a completely new PR for this since the approach is different, or squash this on top of the code you've already seen and force push?

Sorry I didn't get back to you in time, but yes I commonly replace changes with completely new ones and squash / force-push to github. These github PR's do let you still view earlier changesets so it's still possible to review them.

ian-llewellyn · 2022-08-23

@andrewleech, I'm resurrecting our month old conversation! :-)

check_msg should not continue to loop

So a trivial code change brings your suggestion to fruition while also making check_msg() more 'configurable' for those who want it:

  • By default, check_msg will make a second attempt to check for a message if a reconnect is necessary. If unsuccessful, it will give up and return to the caller.
  • Calling check_msg(attempts=x) sets the number of attempts to be made.
  • Calling check_msg(attempts=-1) instructs check_msg not to return until the connection is re-established and the function is otherwise successful.

What do you think?

andrewleech · 2022-09-13

@ian-llewellyn This does look quite good to me, nice cleanup. Keeps it all quite simple and I feel true-to-intention.

ian-llewellyn · 2022-09-27

Thank for the positive discourse @andrewleech. I saw that master has moved on a bit since my initial PR, so I've rebased, updated manifest.py and force pushed. Hopefully this is now ready for a final review and merge. :crossed_fingers:

andrewleech · 2022-09-27

Looks good to me, I'll have to leave it to @jimmo for a final review.

dpgeorge · 2022-10-04

Thanks for the contribution. There is definitely a bug here, check_msg() shouldn't block.

The fix here looks good. Rebased and merged in 4dc2d5e17f1dfa0ca8af731cb1b6eec437731b25

Keyboard

j / / n
next pair
k / / p
previous pair
1 / / h
show query pane
2 / / l
show candidate pane
c
copy suggested comment
r
toggle reasoning
g i
go to index
?
show this help
esc
close overlays

press ? or esc to close

copied