Opened 10 years ago

Closed 9 years ago

#28 closed defect (fixed)

Frequent Hangs

Reported by: David McKenna Owned by:
Priority: Feedback Pending Milestone: Beta1
Component: wrapper Version: Technology Preview 2
Severity: Keywords:
Cc:

Description

Hardware: AMD Athlon 64 4000+ (single core), NVidia3, Marvell Yukon NIC, NVidia CK8S audio, ATI Radeon 9600 AGP video, 3GB RAM.

Software: eCS 2.0 GA with SMP kernel and ACPI.PSD /SMP /APIC /TMR, Uniaud 1.9.5/1.9.24, DANIS 1.8.7, SNAP, SeaMonkey? 2.1b2pre - only Flash 10 preview2 no other plugins.

Spending 5 minutes (or less) on any of these web sites results in a hang:

www.msn.com
www.nytimes.com
www.marketwatch.com
www.economist.com

Sometimes <ctrl> <esc> will bring up a dialog allowing closing SeaMonkey?, but usually requires a reboot.

It happens on other sites as well, but those listed will do it every time.

Change History (32)

comment:1 Changed 10 years ago by Silvan Scherrer

please test it it's better in single mode. means removing /SMP

comment:2 Changed 10 years ago by David McKenna

Unfortunately, this computer will only boot in /APIC mode. It will not boot without ACPI.PSD either. I can test on my other computer (laptop) later this week.

With Rich's latest SeaMonkey? (20101030120829) it seems to go a lot longer before hanging...

comment:3 Changed 10 years ago by David McKenna

Running on my laptop (eCS 2.0 and SM 2.1b2pre on AMD 64 single core, VIA chipset, 1GB RAM) shows similar behavior even without /SMP - frequent hangs...

comment:4 Changed 10 years ago by David McKenna

Version: Technology Preview 2

comment:5 Changed 10 years ago by David McKenna

I finally figured out how to get this computer to boot in /PIC mode (it only took about 3 years and required 8 LINK statements in ACPI.CFG in a specific order). Even so, I still get frequent hangs on web pages with flash content (especially nytimes.com - it really tortures the flash plugin) when using the SMP kernel.

I decided to install the UNI kernel (14.104a) and since doing that (about a week now) I have not had one single hang. It sometimes will block being able to scroll or change page in the web browser temporarily, but so far always releases when the flash content has finished loading.

comment:6 Changed 10 years ago by Silvan Scherrer

the smp problem is known. we try to solve that while doing java development. it's a common odin problem.

see also http://svn.netlabs.org/odin32/ticket/28 for progress an that

comment:7 Changed 9 years ago by Silvan Scherrer

Milestone: Beta1
Priority: majorFeedback Pending

is this still an issue with the latest odin build?

comment:8 Changed 9 years ago by David McKenna

Yes, I still get hangs with the latest ODIN, although probably not as frequently. The websites previously listed are still good for a quick hang...

If there is a more rigorous way to test this problem I am willing to help test... just let me know how.

comment:9 Changed 9 years ago by ncattng

Cc: ncattng added
Keywords: Food Stamps added
Priority: Feedback Pendingtrivial
Summary: Frequent HangsLogin to https://www.humanservices.state.pa.us/Compass.Web/CMHOM.aspx <Apply for benefits>
Type: defecttask
Last edited 9 years ago by ncattng (previous) (diff)

comment:10 Changed 9 years ago by ncattng

Keywords: delete added; Food Stamps removed
Summary: Login to https://www.humanservices.state.pa.us/Compass.Web/CMHOM.aspx <Apply for benefits>delete

comment:11 Changed 9 years ago by abwillis

One workaround is to use mpunsafe on the Firefox or Seamonkey executable... it pins the browser to one CPU.

comment:12 Changed 9 years ago by Silvan Scherrer

Cc: ncattng removed
Keywords: delete removed
Priority: trivialFeedback Pending
Summary: deleteFrequent Hangs
Type: taskdefect

comment:13 Changed 9 years ago by dmik

Tried the GCC Flash and GCC Odin, compared with the old Flash/Odin? and don't see any big difference in the behavior so far. In both cases, it eventually hangs after playing some flash videos. In UNI mode it happens later, in SMP -- much sooner (usually after a few seconds of playback).

I have a feeling that it is somehow related to message processing. Under some circumstances, Flash screws PM through Odin. I will try to debug it.

comment:14 Changed 9 years ago by dmik

I see that when using the debug version of Odin and Flash, Firefox frequently terminates when playing a flash video (it may result in hangs in the release version). In the logs, it looks like the timer thread decides to terminate the whole process when it terminates itself for some reason. I can't yet prove or disprove yet. Will continue experimenting.

comment:15 Changed 9 years ago by dmik

Tried to disable audio in Odin. The play back lasts longer and then crashes. The crash is in CreateThread?() (breakpoint at #122). Investigating.

comment:16 Changed 9 years ago by dmik

Enabling the really heavy logging (the libc one) in addition to all other logs makes the playback quite stable (in terms of not crashing for a very long time). This is a good indicator that we have another timing problem.

comment:17 Changed 9 years ago by dmik

BTW, reproduced a crash in CreateThread?() once: DosCreateThread? returns 164 and the application terminates on a breakpoint. 164 is ERROR_MAX_THRDS_REACHED. I see many hundreds of threads in the log file. All of them seem to be timer threads started by OS2Timer in the WINMM code... This is definitely wrong.

comment:18 Changed 9 years ago by dmik

In either case, a new thread per each timer doesn't look right. I will study the timer code more closely to see how this can be optimized.

comment:19 Changed 9 years ago by dmik

Ok, there are three different problems I spotted so far:

  1. Death because of too many OS2Timer threads (reaching the per-process thread limit).
  2. Death because of the process termination when ending one of the OS2Timer threads.
  3. Death because the shared memory block in Odin (which is now 2M max) is exhausted.

The reason for 1 is not fully known to me, but I have a guess that newly created timer threads don't get enough CPU time to fire their event and terminate because some other thread uses it all (while still continuing to start new timer threads). I will try to play with the priority of these timer threads to see if it helps.

Problem 2 may be caused by the fact that timer threads too often get killed with TerminateThread? instead of letting it terminate normally (Odin timer implementation bug). TerminateThread? is dangerous, it may leave some objects in an inconsistent state and this may cause the process termination. At least when I disable this TerminateThread? code, I don't see this process termination any more.

Problem 3 may happen because there is some leak in Odin (and in the OS2Timer code in particular). I still haven't checked it all but I will do it. At least, increasing the shared heap size to 10M lets me run Flash w/o crashes for quite a long time.

The reason why the above problems affect Flash in the first place is because Flash seems to use Win32 multimedia timers a lot and the OS2Timer class is what implements these timers in Odin. More over, some Flash content may require more timer events than other and therefore it will hang/crash (due to the above problems) sooner and more often.

I will commit fixes after I've done more checking and testing.

comment:20 Changed 9 years ago by dmik

I removed TerminateThread?() usage and this seems to fix the main problem with timers in Flash -- my own tests (and tests of other people) show that it can run quite smoothly w/o crashes now. More over, even if I restore the shared heap size back to 2MB, I still don't get crashes. This may mean that TerminateThread?()/ExitThread?() leak resources and eventually exhaust the shared memory pool when used too often (as in case of timer threads). I created http://svn.netlabs.org/odin32/ticket/62 to track this problem separately.

However, I increased the shared heap size up to 8MB anyway -- the memory is committed only as necessary so it should not hurt but increase the application stability in case of leaks in other places since the Odin code generally doesn't check if the memory allocation fails and will therefore hang or crash if it happens.

So, problems 2 and 3 are solved now.

Last edited 9 years ago by dmik (previous) (diff)

comment:21 Changed 9 years ago by dmik

I also changed the logic of the thread priority for timer threads which should prevent cases when the timer callback function does something expensive at the high priority blocking other threads.

comment:22 Changed 9 years ago by Silvan Scherrer

please test with odin 0.8.2 and latest flash dll. latest flash dll should be available at betazone

comment:23 Changed 9 years ago by David McKenna

I don't see a new Flash dll anywhere on the BetaZone?... do you know where it is located?

comment:24 Changed 9 years ago by Silvan Scherrer

it should be there, but sometimes it takes longer. but even with the old flash preview2 dll it should be a lot more stable, as all fixes went to odin and not the plugin.

comment:25 Changed 9 years ago by David McKenna

Yes, I have been using ODIN 0.8.2 since it came out and it is very much more stable... as long as I do not use LIBC064. I put LIBC063.DLL (from the libc063-csd3 package) in my SeaMonkey? directory and run it LIBPATHSTRICT and I have yet to have a hard hang. If I don't do that, I get hangs, maybe not as much but enough to want to disable Flash.

comment:26 Changed 9 years ago by Silvan Scherrer

are you sure you have only one libc063 after installing libc064? to be sure that not one app is loading the old libc063 and another wants to load the forwarder dll

Last edited 9 years ago by Silvan Scherrer (previous) (diff)

comment:27 Changed 9 years ago by David McKenna

Yes.... positive.

comment:28 Changed 9 years ago by Silvan Scherrer

please use debug odin and collect logs. odin debug zip is also at netlabs ftp server. be sure to run seamonkey from cmdline with the following set:
To enable logging set the environment variable WIN32LOG_ENABLED:

SET WIN32LOG_ENABLED=1

please use only libc064 and make really sure there is no older libc063 dll anywhere on your system.

comment:29 Changed 9 years ago by Silvan Scherrer

David, if you give me your e-mail i can send you a debug plugin with some more logging.

comment:30 Changed 9 years ago by David McKenna

Sure... davidmckenna<att>comcast[ddot]net

comment:31 Changed 9 years ago by David McKenna

Using the debug plugin and debug Odin with LIBC064 I do not see any of the hangs I was getting before.

comment:32 Changed 9 years ago by dmik

Resolution: fixed
Status: newclosed

I think that the hangs are basically gone now. Feel free to open a new defect if you discover new hangs (hopefully not).

Note: See TracTickets for help on using tickets.