  SSL

    SSL

    I'm a layman so just wondering if anyone's familiar with binaural modulation perception. I'd need some clarification on the terminology used in modulation spectrogram and how it's significant per se. Thanks.
  Goodgaar

    Goodgaar

    Bác cứ hỏi đi, em không chắc em có thể trả lời được câu hỏi của bác không nữa nhưng mà nếu nằm trong vấn đề có liên quan đến cái em làm thì chắc là OK.
  SSL

    SSL

    Cảm ơn bạn,

    Mình muốn hỏi trong modulation spectrogram tại sao lại lowpass (in general, only consider spectral components of less than or equal to 400Hz), bỏ hết speech information in high frequencies? Nó có advantages gì so với a regular spectrogram? Và cái filter bank (required in the process of deriving a modulation spectrogram) có specs như thế nào, e.g., bandwiths, freq. ranges, overlap, etc.? Thanks a lot.
  cuong

    cuong

    Tui không chắc là nhớ chính xác đâu nhé, chỉ type lại theo những gì mình nhớ thôi.

    Spectrogram là biểu diễn của speech signal by frequency, amplitude and time. Những lower frequencies thường có higher amplitude (higher energy). Ba tần số thấp nhất thường được gọi là "tần số cơ bản", nó chứa thông tin chủ yếu (thế nào là chủ yếu thì không nhớ :) ), ở mức là nếu chỉ dùng phần 3 tần số thấp này của spectrogram cũng có thể tái dựng lại tiếng nói (text to speech) ở mức good cần thiết.

  Sintayo

    Sintayo

    Here is an excellent introductory paper with a lot of references:

    They don't use a single low pass filter at 400 Hz.
    They might use TEN 400 Hz wide bandpass filters to cover the frequency range of 0 to 4 kHz. Then they sample the power output of these at a lower frequency like 80 Hz.
    The goal of this is to convert speech into phonemes, either for computer recognition or data compression.
  SSL

    SSL

    Thanks for the very useful paper. Better stick with icassp when dealing with DSP. I've been confusing myself with papers in neuroscience's literature. No offense, but their signal processing wording is "different". However, there's a slight difference between their mod. spec. (e.g., the one in "Speech enhancement based on physiological and psychoacoustical models of modulation perception and binaural interaction" by Kollmeier & Koch) and the one in the given ICASSP paper. They did apply a lowpass in their mod. spec., and only consider freq. components less than 400Hz.
    In the Icassp one, it's a little questionable to me that they only extract the first coefficient of the FFT (i.e., 4Hz-sample out of the 250ms-80Hz sampled data), of course this would get rid of all the noise in high freq., hence, giving all the beneficial features when compared to a regular spectrogram as they showed in the paper. Again, why would they discard all the high freq. stuff? I'm not familiar with the applications of this mod. spec. but in my opinion, a regular spectrogram is no worse, not to say that it conveys more useful information. Correct me if I'm wrong, and most likely I am since this is just my quick thought from a very vague examination.

