[FFmpeg-trac] #4178(avformat:open): Opus audio in MKV container
FFmpeg
trac at avcodec.org
Sun Apr 22 09:35:05 EEST 2018
#4178: Opus audio in MKV container
-------------------------------------+-------------------------------------
Reporter: agressiv | Owner: vigneshvg
Type: defect | Status: open
Priority: important | Component: avformat
Version: git-master | Resolution:
Keywords: mkv opus | Blocked By:
regression | Reproduced by developer: 1
Blocking: |
Analyzed by developer: 1 |
-------------------------------------+-------------------------------------
Comment (by mkver):
I can reliably create such files with ffmpeg and have a theory on why this
is happening. The ultrashort answer is: Bad things can happen if the
timestamps that the libopus encoder receives aren't perfect.
Before I come to the long answer, let me add that I used the current git-
master to produce the framehash logs that you will see. In more detail
{{{
ffmpeg version N-90800-g8592ae1a1e Copyright (c) 2000-2018 the FFmpeg
developers
built with gcc 7.3.0 (Rev1, Built by MSYS2 project)
configuration: --disable-static --enable-shared --disable-amf --disable-
cuda --disable-cuvid --disable-d3d11va --disable-nvenc --disable-ffnvcodec
--disable-debug --enable-libopus --enable-libbluray --enable-libmfx
--enable-libsoxr --enable-libwavpack --enable-gpl --enable-openssl
--enable-avisynth --enable-libfdk-aac --enable-libzvbi --disable-
encoder=dca --disable-encoder=nellymoser --disable-encoder=real_144
--disable-encoder=truehd --disable-encoder=vorbis --disable-encoder=sonic
--disable-encoder=sonicls --disable-encoder=amv --disable-encoder=asv1
--disable-encoder=asv2 --disable-encoder=flashsv --disable-
encoder=flashsv2 --disable-encoder=roqvideo --disable-encoder=svq1
--disable-encoder=zmbv --disable-encoder=zlib --disable-encoder=snow
--disable-encoder=cinepak --disable-encoder=a64multi --disable-
encoder=a64multi5 --disable-encoder=h261 --disable-encoder=h263 --disable-
encoder=h263p --disable-encoder=wmv7 --disable-encoder=wmav1 --disable-
encoder=wmav2 --disable-encoder=wmv8 --enable-nonfree --shlibdir=/local64
/bin-video
libavutil 56. 15.100 / 56. 15.100
libavcodec 58. 19.100 / 58. 19.100
libavformat 58. 13.100 / 58. 13.100
libavdevice 58. 4.100 / 58. 4.100
libavfilter 7. 19.100 / 7. 19.100
libswscale 5. 2.100 / 5. 2.100
libswresample 3. 2.100 / 3. 2.100
libpostproc 55. 2.100 / 55. 2.100
}}}
So the other logs won't have the version field.
a) First some information about granule positions: The granule positions
in ogg pages indicate the position in the stream after decoding all
packets which are completely within that page. They are restricted as
follows:
i) All pages with completed packets except the first and the last MUST
have a granule position equal to the number of samples contained in
packets that complete on that page plus the granule position of the most
recent page with completed packets. (From section 4 of
[https://tools.ietf.org/html/rfc7845 RFC 7845].)
ii) If a page has the 'end of stream' flag set, then instead of the above
the difference between the number of samples contained in the packets that
complete on that page and the difference between the granule position of
said page and the most recent page with completed packets indicates how
many samples should be trimmed away at the end; if there was no earlier
page with completed packets, then one should work as if the granule
position of the most recent earlier page with completed packets were zero
(in this case one also has to apply the preskip). (This is 4.4 of RFC
7845.)
iii) If the first page with completed packets isn't also the last page
(then ii) applies) then it must have a granule position that is >= the sum
of the number of samples contained in packages that complete on that page
(the pre-skip is ignored in calculating the sum) so that there are no
negative granule positions when working backwards. The granule position
may be larger than the sum (useful for synchronization with other streams
in the same multiplex); if the sum is larger then the stream is completely
invalid (yes, the whole stream, not only the first page or the samples
which would have negative granule positions). (This is 4.5 of RFC 7845.)
b) The easiest way to produce malformed files is by using a negative
-itsoffset:
{{{
ffmpeg -itsoffset -0.5 -i test.dts -c:a libopus offset.-0.5.opus
}}}
opusinfo (a part of the opus-tools package from the creators of the opus
codec) complains about this file:
{{{
Processing file "offset.-0.5.opus"...
New logical stream (#1, serial: 7d2420f4): type opus
Encoded with Lavf58.13.100
User comments section follows...
encoder=Lavc58.19.100 libopus
WARNING: Samples with negative granpos in stream 1
Opus stream 1:
Pre-skip: 312
Playback gain: 0 dB
Channels: 2
Original sample rate: 48000Hz
Packet duration: 20.0ms (max), 20.0ms (avg), 20.0ms (min)
Page duration: 1000.0ms (max), 968.4ms (avg), 20.0ms (min)
Total data length: 386932 bytes (overhead: 0.811%)
Playback length: 0m:30.005s
Average bitrate: 103.2 kb/s, w/o overhead: 102.3 kb/s
Logical stream 1 ended
}}}
Let's use the framehash muxer to see what the timestamps are when they
leave the encoder:
{{{
ffmpeg -itsoffset -0.5 -i fl.dts -c:a libopus -f framehash -hash crc32 -
...
#format: frame checksums
#version: 2
#hash: CRC32
#extradata 0, 19, ea5d642a
#software: Lavf58.13.100
#tb 0: 1/48000
#media_type 0: audio
#codec_id 0: opus
#sample_rate 0: 48000
#channel_layout 0: 3
#channel_layout_name 0: stereo
#stream#, dts, pts, duration, size, hash
0, -24312, -24312, 960, 425, 996864ad
0, -23352, -23352, 960, 241, 1fcc1d4d
0, -22392, -22392, 960, 228, 05f1dd79
0, -21432, -21432, 960, 225, e56a7998
0, -20472, -20472, 960, 224, a12a261d
0, -19512, -19512, 960, 226, 27020d0e
0, -18552, -18552, 960, 249, ab31aeb9
0, -17592, -17592, 960, 241, 44e9b2e4
0, -16632, -16632, 960, 241, 8d5dbc65
0, -15672, -15672, 960, 253, 54c603d7
0, -14712, -14712, 960, 256, f9acaea3
0, -13752, -13752, 960, 254, 308a7027
0, -12792, -12792, 960, 262, 297c12b8
0, -11832, -11832, 960, 271, 86b889ca
0, -10872, -10872, 960, 266, 07e95927
0, -9912, -9912, 960, 271, eedd9414
0, -8952, -8952, 960, 275, 856a747f
0, -7992, -7992, 960, 275, 48c4343e
0, -7032, -7032, 960, 281, a54c56ad
0, -6072, -6072, 960, 275, a5ede609
0, -5112, -5112, 960, 271, f5795567
0, -4152, -4152, 960, 270, cb1f8e24
0, -3192, -3192, 960, 282, 9c81d325
0, -2232, -2232, 960, 287, c4bec144
0, -1272, -1272, 960, 276, 6978978a
0, -312, -312, 960, 280, 928ce969
0, 648, 648, 960, 288, a8b67809
0, 1608, 1608, 960, 289, d08817ee
0, 2568, 2568, 960, 281, 785f7424
0, 3528, 3528, 960, 271, 01a150fd
0, 4488, 4488, 960, 279, 1d1a6926
0, 5448, 5448, 960, 299, 4ad6192a
0, 6408, 6408, 960, 401, 1ba1ba43
0, 7368, 7368, 960, 297, 722e745b
0, 8328, 8328, 960, 399, bb637945
0, 9288, 9288, 960, 296, b746197e
0, 10248, 10248, 960, 276, 44dde335
0, 11208, 11208, 960, 279, 3ffcb2f5
0, 12168, 12168, 960, 293, 481af07f
0, 13128, 13128, 960, 286, fbc2d89c
0, 14088, 14088, 960, 278, 2983e9a8
0, 15048, 15048, 960, 283, 6a8c6b1b
0, 16008, 16008, 960, 285, b7b3a531
0, 16968, 16968, 960, 285, 5ee67d70
0, 17928, 17928, 960, 266, f3ad421b
0, 18888, 18888, 960, 261, bea0961e
0, 19848, 19848, 960, 272, d463ae16
0, 20808, 20808, 960, 383, 3b03279b
0, 21768, 21768, 960, 278, 9401e990
0, 22728, 22728, 960, 270, abfad09a
0, 23688, 23688, 960, 295, 1da5ee48
0, 24648, 24648, 960, 266, 21c45f34
0, 25608, 25608, 960, 256, 5003d43c
0, 26568, 26568, 960, 274, dae2fa79
0, 27528, 27528, 960, 268, 80b438cb
}}}
-0.5s are 24000 samples and the remaining difference of 312 samples are
due to libopus' pre-skip of 312 samples. If one analyzes the generated
file directly one sees that the first page contains 50 packets with 960
samples each, i.e. 48000 samples (of which the first 312 are invalid), but
the granule position of the first page shows 24000; if one simply
calculated backwards, this means that the first packet would have started
at -24000 which is against a) iii) above. Needless to say that the first
page doesn't have the 'end of stream' flag set.
Given the fact that according to the spec the whole stream has to be
treated as invalid it is actually strange that opusinfo emits only a
warning and not an error. The reference decoder, too, doesn't treat the
stream as invalid: Instead it treats the first page as if it has end
trimming although it doesn't have the 'end of stream' flag. In our sample
this means that from the first page with 48000 samples the last 24000
samples are stripped away because of end trimming and the first 312
because of pre-skip. And indeed comparing the input file with the output
of the reference decoder shows that they are essentially the same for the
first 23688 samples and then totally different.
This shows that the ogg muxer should automatically shift the granule
positions to make them (both those implied and those explicitly written)
non-negative. (But there is a problem here: In ogg, the relationsship
between a timestamp and the granule position is codec-dependant and it
needn't be a linear relationsship like for opus so shifting other tracks
might be complicated.)
c) If one uses an itsoffset larger than the page_duration of the ogg
muxer, opusinfo complains even more: "Negative or zero granulepos (-14400)
on Opus stream outside of headers. This file was created by a buggy
encoder"
d) Here is another way to come into a situation like b), but without using
itsoffset. It has to do with odd behaviour (I'd call it a bug) of the
native opus decoder. Instead of stripping the pre-skip away like the
reference decoder does, it simply gives them negative timestamps. Here is
a part of framehash's output of what this looks like for a non-defective
file:
{{{
#format: frame checksums
#version: 2
#hash: CRC32
#software: Lavf58.13.100
#tb 0: 1/48000
#media_type 0: audio
#codec_id 0: pcm_s16le
#sample_rate 0: 48000
#channel_layout 0: 3
#channel_layout_name 0: stereo
#stream#, dts, pts, duration, size, hash
0, -312, -312, 960, 3840, 1908c39f
0, 648, 648, 960, 3840, 239ecff4
0, 1608, 1608, 960, 3840, c5dd9714
0, 2568, 2568, 960, 3840, 1173d416
0, 3528, 3528, 960, 3840, e4e9ca53
0, 4488, 4488, 960, 3840, dbc3e9f0
0, 5448, 5448, 960, 3840, 2187b445
0, 6408, 6408, 960, 3840, 25180cb2
0, 7368, 7368, 960, 3840, 788bf31b
0, 8328, 8328, 960, 3840, 1c3b1f55
0, 9288, 9288, 960, 3840, a67eae2f
0, 10248, 10248, 960, 3840, 17cc83a0
}}}
So if one uses an ordinary opus file as input, decodes it with the native
decoder (the default decoder) and encodes this with libopus, one is in the
very same situation as b). If one uses the libopus decoder, the result is
fine.
This behaviour of the native decoder is actually at the heart of #4692.
This was exactly the situation which made me realize what's going on. See
[https://forum.doom9.org/showthread.php?p=1839926#post1839926 here].
e) I can also explain anthontex's observation with the exception of the
part where he claims that streams <=5.1 seem to work. This time the root
cause is lacing (that is used by default by mkvmerge for e.g. dts/dca
tracks) probably coupled with strange timestamp rounding. Notice that
test.dts is actually a stereo dts track.
The timestamps from the dts file are good:
{{{
ffmpeg -i test.dts -f framehash -hash crc32 -
...
#format: frame checksums
#version: 2
#hash: CRC32
#software: Lavf58.13.100
#tb 0: 1/48000
#media_type 0: audio
#codec_id 0: pcm_s16le
#sample_rate 0: 48000
#channel_layout 0: 3
#channel_layout_name 0: stereo
#stream#, dts, pts, duration, size, hash
0, 0, 0, 512, 2048, 52bbda48
0, 512, 512, 512, 2048, 2b037e9f
0, 1024, 1024, 512, 2048, f69e3985
0, 1536, 1536, 512, 2048, 04f27523
0, 2048, 2048, 512, 2048, 0c9b0963
0, 2560, 2560, 512, 2048, de6e37eb
0, 3072, 3072, 512, 2048, 2230f372
0, 3584, 3584, 512, 2048, b4275a94
0, 4096, 4096, 512, 2048, e2efc7d5
0, 4608, 4608, 512, 2048, e6ff0c6f
0, 5120, 5120, 512, 2048, 43d5c355
0, 5632, 5632, 512, 2048, f689afdb
0, 6144, 6144, 512, 2048, 7ce06f4f
0, 6656, 6656, 512, 2048, d639e9c7
0, 7168, 7168, 512, 2048, 87aee60f
0, 7680, 7680, 512, 2048, 6e32d1e1
0, 8192, 8192, 512, 2048, 99b53229
0, 8704, 8704, 512, 2048, 46803053
0, 9216, 9216, 512, 2048, 4e4143b5
0, 9728, 9728, 512, 2048, 2116fa38
...
}}}
If one remuxes test.dts with mkvmerge and specifies a timecode/timestamp-
scale factor of 1000000 (for files who don't have a video track, mkvmerge
by default uses a timecode/timestamp-scale factor that is small enough so
that 1 tick of the timebase is less than the time of one sample so that
timecodes/timestamps in the file are actually sample accurate; if there is
a video track, it defaults to 1000000 (i.e. 1ms precision)), the
timestamps aren't good any more (the merged file is called
"Test.Laced.Big.TS.mka" ("TS" means TimestampScale)):
{{{
ffmpeg -i Test.Laced.Big.TS.mka -f framehash -hash crc32 -
...
#format: frame checksums
#version: 2
#hash: CRC32
#software: Lavf58.13.100
#tb 0: 1/48000
#media_type 0: audio
#codec_id 0: pcm_s16le
#sample_rate 0: 48000
#channel_layout 0: 3
#channel_layout_name 0: stereo
#stream#, dts, pts, duration, size, hash
0, 0, 0, 512, 2048, 52bbda48
0, 504, 504, 512, 2048, 2b037e9f
0, 1016, 1016, 512, 2048, f69e3985
0, 1512, 1512, 512, 2048, 04f27523
0, 2024, 2024, 512, 2048, 0c9b0963
0, 2536, 2536, 512, 2048, de6e37eb
0, 3048, 3048, 512, 2048, 2230f372
0, 3560, 3560, 512, 2048, b4275a94
0, 4072, 4072, 512, 2048, e2efc7d5
0, 4584, 4584, 512, 2048, e6ff0c6f
0, 5096, 5096, 512, 2048, 43d5c355
0, 5592, 5592, 512, 2048, f689afdb
0, 6104, 6104, 512, 2048, 7ce06f4f
0, 6616, 6616, 512, 2048, d639e9c7
0, 7128, 7128, 512, 2048, 87aee60f
0, 7640, 7640, 512, 2048, 6e32d1e1
0, 8184, 8184, 512, 2048, 99b53229
0, 8696, 8696, 512, 2048, 46803053
0, 9208, 9208, 512, 2048, 4e4143b5
0, 9720, 9720, 512, 2048, 2116fa38
0, 10232, 10232, 512, 2048, ffd0d2d3
0, 10744, 10744, 512, 2048, ab0e8d25
0, 11256, 11256, 512, 2048, d75d5dbf
0, 11768, 11768, 512, 2048, 495f20b4
0, 12280, 12280, 512, 2048, c73a83e5
0, 12792, 12792, 512, 2048, 1a8bd665
0, 13304, 13304, 512, 2048, 37baf488
0, 13800, 13800, 512, 2048, 75a43386
0, 14312, 14312, 512, 2048, bac86852
0, 14824, 14824, 512, 2048, cfa03cf6
0, 15336, 15336, 512, 2048, ec85b2cf
0, 15848, 15848, 512, 2048, 568417f0
0, 16360, 16360, 512, 2048, de55f656
0, 16872, 16872, 512, 2048, b4471f41
0, 17384, 17384, 512, 2048, a8b615d7
0, 17880, 17880, 512, 2048, 634e69bd
0, 18392, 18392, 512, 2048, 28bb1df8
0, 18904, 18904, 512, 2048, 7a2b2546
0, 19416, 19416, 512, 2048, dd67f369
0, 19928, 19928, 512, 2048, 72468c87
0, 20472, 20472, 512, 2048, 31358846
0, 20984, 20984, 512, 2048, 1b25d341
0, 21496, 21496, 512, 2048, 0f188f8e
0, 22008, 22008, 512, 2048, d4c28420
0, 22520, 22520, 512, 2048, c2a2cc15
0, 23032, 23032, 512, 2048, 97348c24
0, 23544, 23544, 512, 2048, 8266b6bd
0, 24056, 24056, 512, 2048, 9492736f
0, 24568, 24568, 512, 2048, a0eb4084
0, 25080, 25080, 512, 2048, 84f6ec09
0, 25592, 25592, 512, 2048, 050f991a
0, 26088, 26088, 512, 2048, deee9a7e
0, 26600, 26600, 512, 2048, 12b66ef5
0, 27112, 27112, 512, 2048, 38780750
0, 27624, 27624, 512, 2048, e309fbb0
0, 28136, 28136, 512, 2048, ee05c406
0, 28648, 28648, 512, 2048, fe965280
0, 29160, 29160, 512, 2048, 0e456d8f
0, 29672, 29672, 512, 2048, 8868c0a4
0, 30168, 30168, 512, 2048, b67200db
0, 30680, 30680, 512, 2048, 98452104
0, 31192, 31192, 512, 2048, 1c1d5dfa
0, 31704, 31704, 512, 2048, 3bb376e9
0, 32216, 32216, 512, 2048, 5cb27573
0, 32760, 32760, 512, 2048, ce8afcf9
0, 33272, 33272, 512, 2048, 671aafd3
0, 33784, 33784, 512, 2048, 6b0ef4ae
0, 34296, 34296, 512, 2048, 6190ea4e
0, 34808, 34808, 512, 2048, 9cc28eec
0, 35320, 35320, 512, 2048, 2fcb40a1
0, 35832, 35832, 512, 2048, c97c7941
0, 36344, 36344, 512, 2048, a8ddb89e
0, 36856, 36856, 512, 2048, 1f03cd39
0, 37368, 37368, 512, 2048, 0ae93b83
0, 37880, 37880, 512, 2048, 2f2c98d4
0, 38376, 38376, 512, 2048, 77460589
0, 38888, 38888, 512, 2048, a4d05c57
0, 39400, 39400, 512, 2048, df5b2b8d
0, 39912, 39912, 512, 2048, 19602dd2
0, 40424, 40424, 512, 2048, 53e32a7f
0, 40936, 40936, 512, 2048, 2f4acb24
0, 41448, 41448, 512, 2048, 29b3fd40
0, 41960, 41960, 512, 2048, 5cd68804
0, 42456, 42456, 512, 2048, 1a8765dd
0, 42968, 42968, 512, 2048, 5fbc5ae7
0, 43480, 43480, 512, 2048, d41ba16a
0, 43992, 43992, 512, 2048, 36614005
0, 44504, 44504, 512, 2048, b959f96a
0, 45048, 45048, 512, 2048, 764ffe4b
0, 45560, 45560, 512, 2048, 45e7b0d8
0, 46072, 46072, 512, 2048, e190ddd4
0, 46584, 46584, 512, 2048, 89c9b204
0, 47096, 47096, 512, 2048, 6d5e6559
0, 47608, 47608, 512, 2048, faa96307
0, 48120, 48120, 512, 2048, cf00e88c
0, 48632, 48632, 512, 2048, 9a7a02d3
...
}}}
If I encode Test.Laced.Big.TS.mka with libopus to Test.Laced.Big.TS.opus,
the resulting file is again defective:
{{{
Processing file "Test.Laced.Big.TS.opus"...
New logical stream (#1, serial: fb757aa2): type opus
Encoded with Lavf58.13.100
User comments section follows...
BPS-eng=1508966
DURATION-eng=00:00:30.006000000
NUMBER_OF_FRAMES-eng=2813
NUMBER_OF_BYTES-eng=5659756
_STATISTICS_WRITING_APP-eng=mkvmerge v22.0.0 ('At The End Of The
World') 64-bit
_STATISTICS_WRITING_DATE_UTC-eng=2018-04-22 04:35:34
_STATISTICS_TAGS-eng=BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
encoder=Lavc58.19.100 libopus
WARNING: Samples with negative granpos in stream 1
WARNING: Sample count ahead of granule (633600>633568) in stream 1
WARNING: Sample count ahead of granule (681600>681568) in stream 1
WARNING: Sample count ahead of granule (729600>729568) in stream 1
WARNING: Sample count ahead of granule (777600>777568) in stream 1
WARNING: Sample count ahead of granule (825600>825584) in stream 1
WARNING: Sample count ahead of granule (873600>873584) in stream 1
WARNING: Sample count ahead of granule (921600>921584) in stream 1
WARNING: Sample count ahead of granule (969600>969584) in stream 1
WARNING: Sample count ahead of granule (1017600>1017584) in stream 1
Opus stream 1:
Pre-skip: 312
Playback gain: 0 dB
Channels: 2
Original sample rate: 48000Hz
Packet duration: 20.0ms (max), 20.0ms (avg), 20.0ms (min)
Page duration: 1020.0ms (max), 1000.7ms (avg), 780.0ms (min)
Total data length: 387229 bytes (overhead: 0.887%)
Playback length: 0m:30.005s
Average bitrate: 103.2 kb/s, w/o overhead: 102.3 kb/s
Logical stream 1 ended
}}}
The reason for the "Samples with negative granpos" warning despite the
first sample not having a negative timestamp is that the ogg format
doesn't explicitly signal the granule position for every packet, but only
for every page (in which a packet is completed) and the ogg muxer uses a
page duration of 1s by default. In order to fill this 1s, one needs
48000-312 = 47688 samples from the input file and for this one needs the
first 94 packets. The first sample of the last of these 94 packets should
have be sample number 47616 (zero-based), but according to the above
framehash it has the timestamp 47608. Consequently sample number 47687 has
the timestamp 47679 and therefore the granule position of the first page
is 47679+1+312 = 47992 (the +1 comes from the fact that the granule
position indicate the position after decoding the whole content of the
page) and that is exactly what is in the output file. That of course means
that the output file is invalid.
Because not every output packet has a granule position, not every
gap/overlap in the samples that are fed to the libopus encoder end up
having an influence on the output file. If the sum of the durations/number
of samples just happens to conincide with the granule position delta, then
everything's fine. This explains why
If one decodes the just created opus file with the reference decoder it
again ignores that a) ii) has the prerequisite of the 'end of stream' flag
being set and discards several samples. This leads to audible distortions
at around sample 633248 (= 633568 (from above opusinfo message) - 312
(pre-skip) - 8 (the number of samples from the end of the first page that
got skipped)).
f) Here is a bit more about the Matroska timestamps:
i) If one uses a timestamp-scale of 1000000 and no lacing the timestamps
are fine despite the second dts packet having a timestamp of 11ms whereas
the first dts packet has only a duration of 10 2/3 ms:
{{{
#format: frame checksums
#version: 2
#hash: CRC32
#software: Lavf58.13.100
#tb 0: 1/48000
#media_type 0: audio
#codec_id 0: pcm_s16le
#sample_rate 0: 48000
#channel_layout 0: 3
#channel_layout_name 0: stereo
#stream#, dts, pts, duration, size, hash
0, 0, 0, 512, 2048, 52bbda48
0, 512, 512, 512, 2048, 2b037e9f
0, 1024, 1024, 512, 2048, f69e3985
0, 1536, 1536, 512, 2048, 04f27523
0, 2048, 2048, 512, 2048, 0c9b0963
0, 2560, 2560, 512, 2048, de6e37eb
0, 3072, 3072, 512, 2048, 2230f372
0, 3584, 3584, 512, 2048, b4275a94
0, 4096, 4096, 512, 2048, e2efc7d5
0, 4608, 4608, 512, 2048, e6ff0c6f
0, 5120, 5120, 512, 2048, 43d5c355
0, 5632, 5632, 512, 2048, f689afdb
0, 6144, 6144, 512, 2048, 7ce06f4f
0, 6656, 6656, 512, 2048, d639e9c7
0, 7168, 7168, 512, 2048, 87aee60f
0, 7680, 7680, 512, 2048, 6e32d1e1
0, 8192, 8192, 512, 2048, 99b53229
0, 8704, 8704, 512, 2048, 46803053
0, 9216, 9216, 512, 2048, 4e4143b5
0, 9728, 9728, 512, 2048, 2116fa38
...
}}}
The result is the same whether the unlaced Matroska file has a default
duration or not.
ii) As has already been said, for files with audio but no video track
mkvmerge uses a smaller TimestampScale (namely 20832 for 48kHz) by
default. With lacing the timestamps are as follows:
{{{
#format: frame checksums
#version: 2
#hash: CRC32
#software: Lavf58.13.100
#tb 0: 1/48000
#media_type 0: audio
#codec_id 0: pcm_s16le
#sample_rate 0: 48000
#channel_layout 0: 3
#channel_layout_name 0: stereo
#stream#, dts, pts, duration, size, hash
0, 0, 0, 512, 2048, 52bbda48
0, 512, 512, 512, 2048, 2b037e9f
0, 1024, 1024, 512, 2048, f69e3985
0, 1536, 1536, 512, 2048, 04f27523
0, 2048, 2048, 512, 2048, 0c9b0963
0, 2560, 2560, 512, 2048, de6e37eb
0, 3072, 3072, 512, 2048, 2230f372
0, 3584, 3584, 512, 2048, b4275a94
0, 4096, 4096, 512, 2048, e2efc7d5
0, 4608, 4608, 512, 2048, e6ff0c6f
0, 5120, 5120, 512, 2048, 43d5c355
0, 5632, 5632, 512, 2048, f689afdb
0, 6144, 6144, 512, 2048, 7ce06f4f
0, 6656, 6656, 512, 2048, d639e9c7
0, 7168, 7168, 512, 2048, 87aee60f
...
0, 22528, 22528, 512, 2048, c2a2cc15
0, 23040, 23040, 512, 2048, 97348c24
0, 23551, 23551, 512, 2048, 8266b6bd
0, 24063, 24063, 512, 2048, 9492736f
0, 24576, 24576, 512, 2048, a0eb4084
0, 25088, 25088, 512, 2048, 84f6ec09
...
}}}
So they are not perfect (there mustn't be any odd timestamps like 23551),
but way better.
Notice that also the first eight packets get different timestamps from the
timestamps they had in e) with the bigger timestamp-scale. This is despite
them being in the same lace and the lace both starting precisely at the
same time (namely at absolute zero which coincides for every
TimestampScale). This happens even when one trims the files to contain
only eight packets (which are all in the same lace). So a lower
TimecodeScale in this case leads to better results despite the file with
the lower TimestampScale not containing any more information about the
timestamps than the file with the default 1000000 TimecodeScale.
iii) Using a small TimestampScale and no lacing leads to good timestamps
(as expected).
iv) It seems that also the DefaultDuration is involved: If one uses the
file from e) (laced, TimestampScale 1000000) and deletes the
DefaultDuration header element (MKVToolNix has a tool named mkvpropedit
for that) one gets even worse timestamps (e.g. the 3384 should actually be
3584):
{{{
#format: frame checksums
#version: 2
#hash: CRC32
#software: Lavf58.13.100
#tb 0: 1/48000
#media_type 0: audio
#codec_id 0: pcm_s16le
#sample_rate 0: 48000
#channel_layout 0: 3
#channel_layout_name 0: stereo
#stream#, dts, pts, duration, size, hash
0, 0, 0, 512, 2048, 52bbda48
0, 504, 504, 512, 2048, 2b037e9f
0, 984, 984, 512, 2048, f69e3985
0, 1464, 1464, 512, 2048, 04f27523
0, 1944, 1944, 512, 2048, 0c9b0963
0, 2424, 2424, 512, 2048, de6e37eb
0, 2904, 2904, 512, 2048, 2230f372
0, 3384, 3384, 512, 2048, b4275a94
0, 4080, 4080, 512, 2048, e2efc7d5
0, 4584, 4584, 512, 2048, e6ff0c6f
0, 5064, 5064, 512, 2048, 43d5c355
0, 5544, 5544, 512, 2048, f689afdb
0, 6024, 6024, 512, 2048, 7ce06f4f
0, 6504, 6504, 512, 2048, d639e9c7
0, 6984, 6984, 512, 2048, 87aee60f
0, 7464, 7464, 512, 2048, 6e32d1e1
0, 8208, 8208, 512, 2048, 99b53229
...
}}}
And consequently one gets way more errors from opusinfo if one encodes the
above:
{{{
Processing file "I:\Neuer Ordner (2)\test.laced.big.ts.no.defdur.opus"...
New logical stream (#1, serial: 0bcf313d): type opus
Encoded with Lavf58.13.100
User comments section follows...
BPS-eng=1508966
DURATION-eng=00:00:30.006000000
NUMBER_OF_FRAMES-eng=2813
NUMBER_OF_BYTES-eng=5659756
_STATISTICS_WRITING_APP-eng=mkvmerge v22.0.0 ('At The End Of The
World') 64-bit
_STATISTICS_WRITING_DATE_UTC-eng=2018-04-22 04:35:34
_STATISTICS_TAGS-eng=BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
encoder=Lavc58.19.100 libopus
WARNING: Samples with negative granpos in stream 1
WARNING: Sample count behind granule (96960<96992) in stream 1
WARNING: Sample count behind granule (145920<145952) in stream 1
WARNING: Sample count behind granule (194880<194920) in stream 1
WARNING: Sample count ahead of granule (243840>243808) in stream 1
WARNING: Sample count behind granule (291840<291872) in stream 1
WARNING: Sample count behind granule (340800<340840) in stream 1
WARNING: Sample count behind granule (389760<389800) in stream 1
WARNING: Sample count behind granule (438720<438760) in stream 1
WARNING: Sample count behind granule (486720<486760) in stream 1
WARNING: Sample count behind granule (535680<535720) in stream 1
WARNING: Sample count behind granule (584640<584680) in stream 1
WARNING: Sample count ahead of granule (633600>633408) in stream 1
WARNING: Sample count ahead of granule (681600>681472) in stream 1
WARNING: Sample count behind granule (730560<730600) in stream 1
WARNING: Sample count ahead of granule (779520>779328) in stream 1
WARNING: Sample count ahead of granule (827520>827392) in stream 1
WARNING: Sample count ahead of granule (875520>875456) in stream 1
WARNING: Sample count behind granule (1118400<1118408) in stream 1
WARNING: Sample count behind granule (1167360<1167368) in stream 1
WARNING: Sample count ahead of granule (1216320>1216288) in stream 1
WARNING: Sample count behind granule (1264320<1264328) in stream 1
WARNING: Sample count behind granule (1313280<1313288) in stream 1
WARNING: Sample count ahead of granule (1362240>1362064) in stream 1
WARNING: Sample count ahead of granule (1410240>1410128) in stream 1
Opus stream 1:
Pre-skip: 312
Playback gain: 0 dB
Channels: 2
Original sample rate: 48000Hz
Packet duration: 20.0ms (max), 20.0ms (avg), 20.0ms (min)
Page duration: 1020.0ms (max), 1000.7ms (avg), 640.0ms (min)
Total data length: 387229 bytes (overhead: 0.887%)
Playback length: 0m:30.006s
Average bitrate: 103.2 kb/s, w/o overhead: 102.3 kb/s
Logical stream 1 ended
}}}
PS: Yes, mkvmerge was also buggy. In fact, I think it still is and will
soon open a bug report for it. For
[https://gitlab.com/mbunkus/mkvtoolnix/issues/2100 example] up until
version 15.0 it used lacing in BlockGroups with DiscardPadding (the result
was that the information to which audio packet the DiscardPadding actually
applies is lost (upon remuxing mkvmerge treated every packet in the block
as if the DiscarPadding element applied to them). But I have never ever
observed it creating bad output if the input file didn't have any issues.
PPS: And I also have some good news: The actual packets (without the
container stuff) of the file created by e) and f) iv) both completely
coincide with what one gets when one direct encodes test.dts. The only
thing that is truly lost is the end trimming.
--
Ticket URL: <https://trac.ffmpeg.org/ticket/4178#comment:18>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker
More information about the FFmpeg-trac
mailing list