Page 1 of 1

Issues with data HDD, kern.log and syslog explode to 9+ GB

Posted: Sun Aug 18, 2019 9:36 am
by stingray
I created a new install with 19.1 MATE earlier this year and it's been running great (or so I thought) until three days ago. Two log files (kern.log and syslog) exploded to over 9 GB each and filled up my root directory within about 20 minutes. I got some help in this thread and got it running again but it happened again yesterday. The first time, I was at least able to boot normally, log in and work on fixing from within. This time I'm not able to log in, I just get kicked back to the login screen after entering my password.

Here is an example of the nearly infinite repeated errors that blew up. Can someone shed some light on these errors? I can post more log data if that would help.

Code: Select all

Aug 15 09:07:39 ~user~ kernel: [143140.874683] 
Aug 15 09:07:39 ~user~ kernel: [143140.874684] btstack dump:
Aug 15 09:07:39 ~user~ kernel: [143140.874685] bn = 0, index = 0
Aug 15 09:07:39 ~user~ kernel: [143140.874685] bn = b0d09f4, index = 0
Aug 15 09:07:39 ~user~ kernel: [143140.874686] bn = 0, index = 0
Aug 15 09:07:39 ~user~ kernel: [143140.874687] bn = b0d09f4, index = 0
Aug 15 09:07:39 ~user~ kernel: [143140.874688] bn = 0, index = 0
Aug 15 09:07:39 ~user~ kernel: [143140.874689] bn = b0d09f4, index = 0
Aug 15 09:07:39 ~user~ kernel: [143140.874689] bn = 0, index = 0
Aug 15 09:07:39 ~user~ kernel: [143140.874690] bn = ffff891a958387a0, index = 528
Aug 15 09:07:39 ~user~ kernel: [143140.874693] ERROR: (device sdb1): dtReadFirst [jfs]: btstack overrun
Aug 15 09:07:39 ~user~ kernel: [143140.874693] 
Aug 15 09:07:39 ~user~ kernel: [143140.874695] btstack dump:
Aug 15 09:07:39 ~user~ kernel: [143140.874695] bn = 0, index = 0
Aug 15 09:07:39 ~user~ kernel: [143140.874696] bn = b0d09f4, index = 0
Aug 15 09:07:39 ~user~ kernel: [143140.874696] bn = 0, index = 0
Aug 15 09:07:39 ~user~ kernel: [143140.874697] bn = b0d09f4, index = 0
Aug 15 09:07:39 ~user~ kernel: [143140.874698] bn = 0, index = 0
Aug 15 09:07:39 ~user~ kernel: [143140.874699] bn = b0d09f4, index = 0
Aug 15 09:07:39 ~user~ kernel: [143140.874700] bn = 0, index = 0
Aug 15 09:07:39 ~user~ kernel: [143140.874701] bn = ffff891a958387a0, index = 528
Aug 15 09:07:39 ~user~ kernel: [143140.874704] ERROR: (device sdb1): dtReadFirst [jfs]: btstack overrun
Aug 15 09:07:39 ~user~ kernel: [143140.874704] 
I have two hard drives, a 250 GB SSD and a 3 TB HDD. Here's "lsblk":

Code: Select all

NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0 232.9G  0 disk 
├─sda1   8:1    0   250M  0 part /boot/efi
├─sda2   8:2    0  31.8G  0 part /
└─sda3   8:3    0 200.9G  0 part /home
sdb      8:16   0   2.7T  0 disk 
└─sdb1   8:17   0   2.7T  0 part /media/data
sr0     11:0    1  1024M  0 rom  
sdb is intended purely for additional space. I have several symlinks in /home (replacing Music, Videos, Pictures) pointing to counterparts in sdb1 so maybe that's a factor here? Also, I have Timeshift saving backups to sdb1. Otherwise, I can't think of any interdependencies that I have created between the drives. I thought I had a good scheme in place but maybe not. Something's obviously messed up now.

For good measure, here is the output of "inxi -Fxz":

Code: Select all

System:
  Host: ~user~ Kernel: 4.15.0-58-generic x86_64 bits: 64 compiler: gcc 
  v: 7.4.0 Desktop: MATE 1.22.0 Distro: Linux Mint 19.2 Tina 
  base: Ubuntu 18.04 bionic 
Machine:
  Type: Desktop System: Gigabyte product: Z97X-UD3H-BK v: N/A 
  serial: <filter> 
  Mobo: Gigabyte model: Z97X-UD3H-BK-CF v: x.x serial: <filter> 
  UEFI: American Megatrends v: F6 date: 06/17/2014 
Battery:
  Device-1: hidpp_battery_0 model: Logitech Wireless Mouse M510 charge: 55% 
  status: Discharging 
CPU:
  Topology: Quad Core model: Intel Core i5-4690S bits: 64 type: MCP 
  arch: Haswell rev: 3 L2 cache: 6144 KiB 
  flags: lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 25600 
  Speed: 800 MHz min/max: 800/3900 MHz Core speeds (MHz): 1: 800 2: 800 
  3: 800 4: 800 
Graphics:
  Device-1: Intel Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics 
  vendor: Gigabyte driver: i915 v: kernel bus ID: 00:02.0 
  Display: x11 server: X.Org 1.19.6 driver: modesetting unloaded: fbdev,vesa 
  resolution: 1920x1080~60Hz 
  OpenGL: renderer: Mesa DRI Intel Haswell Desktop v: 4.5 Mesa 19.0.8 
  direct render: Yes 
Audio:
  Device-1: Intel Xeon E3-1200 v3/4th Gen Core Processor HD Audio 
  driver: snd_hda_intel v: kernel bus ID: 00:03.0 
  Device-2: Intel 9 Series Family HD Audio vendor: Gigabyte 
  driver: snd_hda_intel v: kernel bus ID: 00:1b.0 
  Sound Server: ALSA v: k4.15.0-58-generic 
Network:
  Device-1: Intel Ethernet I217-V vendor: Gigabyte driver: e1000e v: 3.2.6-k 
  port: f080 bus ID: 00:19.0 
  IF: eno1 state: down mac: <filter> 
  Device-2: Atheros AR9271 802.11n type: USB driver: ath9k_htc 
  bus ID: 3-13:9 
  IF: wlx64700225f629 state: up mac: <filter> 
Drives:
  Local Storage: total: 2.96 TiB used: 380.13 GiB (12.6%) 
  ID-1: /dev/sda vendor: Samsung model: SSD 850 EVO 250GB size: 232.89 GiB 
  ID-2: /dev/sdb vendor: Western Digital model: WD30EZRZ-00WN9B0 
  size: 2.73 TiB 
Partition:
  ID-1: / size: 31.18 GiB used: 11.15 GiB (35.7%) fs: ext4 dev: /dev/sda2 
  ID-2: /home size: 196.68 GiB used: 23.25 GiB (11.8%) fs: ext4 
  dev: /dev/sda3 
Sensors:
  System Temperatures: cpu: 29.8 C mobo: 27.8 C 
  Fan Speeds (RPM): N/A 
Info:
  Processes: 194 Uptime: 5h 36m Memory: 15.53 GiB used: 2.14 GiB (13.8%) 
  Init: systemd runlevel: 5 Compilers: gcc: 7.4.0 Shell: bash v: 4.4.20 
  inxi: 3.0.32 

Re: Issues with data HDD, kern.log and syslog explode to 9+ GB

Posted: Sun Aug 18, 2019 4:57 pm
by catweazel
stingray wrote:
Sun Aug 18, 2019 9:36 am

Code: Select all

Aug 15 09:07:39 ~user~ kernel: [143140.874704] ERROR: (device sdb1): dtReadFirst [jfs]: btstack overrun
Something's obviously messed up now.
File system corruption. Backup your treasured stuff and format the drive with a more suitable file system. That's the first place to start.

Re: Issues with data HDD, kern.log and syslog explode to 9+ GB

Posted: Sun Aug 18, 2019 5:41 pm
by gm10
Well, JFS shouldn't constantly corrupt, so make sure it's not one of the usual suspects (hardware defects with the drive or elsewhere in the system), kernel version and/or series, etc.

Not sure why Moem told you to make a second thread by the way, this is the same issue and still not resolved. Oh well.

Re: Issues with data HDD, kern.log and syslog explode to 9+ GB

Posted: Sun Aug 18, 2019 11:37 pm
by stingray
gm10 wrote:
Sun Aug 18, 2019 5:41 pm
Well, JFS shouldn't constantly corrupt, so make sure it's not one of the usual suspects (hardware defects with the drive or elsewhere in the system), kernel version and/or series, etc.

Not sure why Moem told you to make a second thread by the way, this is the same issue and still not resolved. Oh well.
I think the rationale is that even though it's the same root issue, it turned out to be more than just a newbie question. Maybe I should have picked the general hardware forum, not the mounting partitions but oh well.

I've used JFS with other systems before without problems. I picked it here because it is reported to be less stressful on drives but I'm not dead set against other more recent options. Now I'm worried it's a fault with the drive itself. It's only been in use for a few months but I bought it a couple of years ago so it's probably out of warranty. Would hate to scrap a "new" 3 TB drive but it might be as easy of a fix as there is, if that's the problem.

When I try to boot I get "ata2.00: COMRESET failed (errno=16)" several times which again points to corruption/filesystem errors. I tried using different SATA cables and ports; it didn't help with booting but I'll have to try that when I get up and running again.

In recovery command prompt I tried to run fsck on sdb1 but it couldn't open it. sdb was listed in /dev. Deleted the huge logfiles so at least root partition will have space now.

Latest kernel is 4.15.0-58-generic, also have 55, 54, and 20 available. Tried recovery mode with 20, it had the same COMRESET error, then proceeded to fsck the SSD. Mounted /boot and /home. Then fsck'd the HDD but failed. After a smattering of errors shown below (may contain typos), I ended up just resetting and going into a livedvd so I could at least shut down gracefully. First I tried mounting /dev/sdb1 and got this error: "wrong fs type, bad option, bad superblock, missing codepage or helper program, or other error". I also edited fstab and commented out the HDD so it won't try to mount it, although it won't be connected.

I'm flying out of town Tuesday morning so tomorrow night will be the last time I will be able to try anything for the rest of the week. Tomorrow, I will try booting without HDD connected, and try accessing HDD via my USB dock.

Code: Select all

exception Emask 0x0 SAct 0x20000 SErr 0x0 action 0x6 frozen
failed command: REEAD FPDMA QUEUED
failed to enable AA (error_mask=0x5)
print_req_error: I/O error, dev sdf, sector...
Buffer I/O error on sdf1, logical block..., lost async page write

Re: Issues with data HDD, kern.log and syslog explode to 9+ GB

Posted: Sun Aug 18, 2019 11:56 pm
by deck_luck
I would check the drive s.m.a.r.t. health. It may already be in the failed state. Use the disks utility to check the s.m.a.r.t status. I prefer to install and configure the smartmontools package. It will run as a daemon and report s.m.a.r.t. disk related events using the syslog facility. Also, it includes the smartctl command to query the status as well as initiate extended testing.

Re: Issues with data HDD, kern.log and syslog explode to 9+ GB

Posted: Mon Aug 19, 2019 12:39 am
by gm10
stingray wrote:
Sun Aug 18, 2019 11:37 pm
Latest kernel is 4.15.0-58-generic, also have 55, 54, and 20 available. Tried recovery mode with 20, it had the same COMRESET error
Ok, so now you know it's not a regression in the 4.15 series. Next step is to try the latest kernel in the 5.0 series (Update Manager > View > Linux kernels). Also check for firmware updates for your UEFI and potentially the drive.

That you should check your drive's health, i.e. SMART status, goes without saying.

Re: Issues with data HDD, kern.log and syslog explode to 9+ GB

Posted: Mon Aug 19, 2019 2:05 am
by deck_luck
"That you should check your drive's health, i.e. SMART status, goes without saying."

Was my suggestion offensive to you? I do not understand your tone.

Re: Issues with data HDD, kern.log and syslog explode to 9+ GB

Posted: Mon Aug 19, 2019 5:55 am
by gm10
deck_luck wrote:
Mon Aug 19, 2019 2:05 am
Was my suggestion offensive to you? I do not understand your tone.
Oh no, not at all, your suggestion was very good and I meant to say that he should do what you suggested no matter what he does with the kernels. Apologies for my poor choice of words, I can see why you understood it the way you did but I can assure that thought didn't even cross my mind.

Re: Issues with data HDD, kern.log and syslog explode to 9+ GB

Posted: Mon Aug 19, 2019 8:49 am
by stingray
I unplugged the HDD completely and booted up with no problems. I SMART scanned the SSD and it checked out OK. (I like the idea of setting that up as an automatic check.) I reconfigured Timeshift to back up to /home. Running good again, should be able to get through the day. Of course nothing pointed to the SSD being the problem. Tonight, I'll see if I can get anywhere with the HDD.

Is there any downside to upgrading to the 5.0 kernel? I've been sticking with the Mint standard, thinking later versions would be a beta situation. Not that 5.0 is beta, but specifically running Mint on 5.0. Is it just a matter of LTS?

Re: Issues with data HDD, kern.log and syslog explode to 9+ GB

Posted: Mon Aug 19, 2019 10:52 am
by gm10
stingray wrote:
Mon Aug 19, 2019 8:49 am
Is there any downside to upgrading to the 5.0 kernel? I've been sticking with the Mint standard, thinking later versions would be a beta situation. Not that 5.0 is beta, but specifically running Mint on 5.0. Is it just a matter of LTS?
From the support perspective yes, it's only a matter of the support duration. The 5.0 kernel is fully supported only for a few months while 4.15 will be supported until 2023.

Re: Issues with data HDD, kern.log and syslog explode to 9+ GB

Posted: Mon Aug 19, 2019 11:10 pm
by stingray
That makes sense. I went ahead and installed 5.0.0-25.

Did not have time to do anything with the HDD tonight, though. I'll be out of town for the rest of this week. I'll post again when I have something to report.

Thanks to everyone who took the time to reply!