Sunday, October 9, 2011

How to play a wave file on the phone (C#)

Our goal here: We want the PC to call a designated phone number when something happens, and report it in easy-to-understand spoken audio snippets which are saved in the form of pre-recorded wave files.

PART I: Find out about your modem's capability.

Most modems can be classified with these characteristics:

Data/fax
Data/fax/voice
Data/fax/voice/speakerphone

In our case here, the modem at least has to support the voice feature. The easiest way to find out if your modem supports voice is to run the HyperTerminal to use AT commands.

Run HyperTerminal and use the following commands:

ATZ

The modem should respond with "OK"

AT+FCLASS=8

If your modem supports voice, it should come back with "OK". Otherwise, "ERROR".

If you are interested, you can type AT+FCLASS=?
The modem should respond with something like: 0,1, 2, 8.

0: Data

1,2: Fax

8: Voice

If it turns out you don't have a voice modem, then we're stuck. If you want to go further with this approach, you'll need to install a modem with voice capability.

Now, if you do have a voice modem, you need to find out about its capabilities regarding the voice data formats it can handle. Use the following AT command to find out:

AT+VSM=? (Remember that AT+FCLASS=8 must be sent prior to sending this command.)

Our modem here comes back with the following:

AT+VSM=?
128,"8-BIT LINEAR",(7200,8000,11025)
129,"16-BIT LINEAR",(7200,8000,11025)
130,"8-BIT ALAW",(8000)
131,"8-BIT ULAW",(8000)
132,"IMA ADPCM",(7200,8000,11025)

OK


We interpret this as: Our modem supports 5 different methods of encoding voice data. If you are interested in learning more about these methods, just Google them. In this example, we will focus on the first method (the simplest one). This method uses 8-bit linear (signed numbers) and supports the following sampling rates:

7200, 8000, or 11025 samples per second.

In theory, most human voice frequency falls between 1kHz to 4kHz. Sampling theory (Nyquist) indicates that you will have to take at least 8kHz in order to reconstruct voice content up to 4kHz. Higher than 8kHz would be better to avoid anti-aliasing. However, higher sampling rates come with the price of heavier data load.

Now, come back to our example, we will choose: 8000 samples per second which is 8kHz. We tell the modem our selection by typing the command:

AT+VSM=128, 8000 (Should get "OK" back.)

Following the above procedure, you should now know if your modem supports voice, and if so, what voice data format it uses. You have to choose the data format now in order to prepare the wave file 
Part II: Wave file - How to read to a buffer

Wave (or Wav) is the standard format for storing audio data on the PC. As software developers, we are interested in the internal structure of the file so that we can open and read data correctly before transmitting it over the phone line.

Fortunately, there are plenty of good articles on the internet addressing the wave file format. Here is one of them: http://www.sonicspot.com/guide/wavefiles.html

We show a small excerpt from that article (thanks to sonicspot) to show the typical layout of a wave file:




From now on we will use HomeZIX software as a demostration platform to show our C# code.

There are 2 files (download from the forum here)

"houston 8kHz Mono 8.wav" is the WAV file with the famous phrase: “Houston we’ve got a problem.”

The second attachment is the C# script to (be imported to HomeZIX) which reads the WAV file. Please download and save the wave file to your C:\ drive, and then load the script into HomeZIX (by dropping an Advanced C# block into your workspace, going to the C# source code window, right clicking on the editor, and selecting File->Open.) The implementation is pretty straightforward. If you don’t care about the details, just focus on the Initialize() function where we open and read wave data into the buffer.

As we pointed out in the Part I, we just want to work with a PCM uncompressed Wave file, Mono, 8 bits per sample and 8kHz sample rate. That’s why our C# script checks to see if we are reading an appropriate file. You may change the file type-checking functionality to accommodate different formats that your voice modem supports.

Put HomeZIX in RUN mode (by selecting Manage->Run.) and click on the Advanced Script block. You should then see debug output, similar to what we are seeing here:



Now, the voice data has been loaded into the buffer. We will send it over the phone line so that whoever answers the phone will hear: “Houston, We’ve got a problem.

PART III: Putting things together

In part I we examined the modem to verify that it supported voice. If so, we took a note about the voice data format that we would use. In the second part, we prepared a wave file and implemented a piece of code in C# to be used by HomeZIX to read the wave file into a buffer. Now, it’s time to put things together to send out that buffer as an audio stream over the phone line to a designated phone number.

In our example here, we are connecting to a voice modem via COM10. It is an old one we got from a local store. We have prepared a wave file: “Houston 8 bit Mono 8kHz.wav” which has the formats: PCM/uncompressed, 8 bits per sample, single channel (mono), 8000 samples per second. We dropped into the HomeZIX’s workplace a virtual switch to control when to start calling the number and playing the wave file. Of course, it’s only for demonstration purposes. In real use, it should call the number and choose appropriate wave file to play according to particular events such as: motion detected, temperature too high, door open, etc…

Download the script to import the C# script to HomeZIX and take a moment to go through the implementation:

Initialize


Expand|Select|Wrap|Line Numbers
  1. public void Initialize()
  2. {   // Required function. DO NOT remove or change the name.
  3.     // Called once when the script start executing.
  4.     string fileName = "C:\\houston 8kHz Mono 8.wav";
  5.     if (!System.IO.File.Exists(fileName)) 
  6.  {
  7.   Debug("Wave file not found.");
  8.   return;
  9.  }
  10.  
  11.     System.IO.FileStream strm = new System.IO.FileStream(fileName, System.IO.FileMode.Open);
  12.     System.IO.BinaryReader rdr = new System.IO.BinaryReader(strm);
  13.     Wave.WAVEFORMATEX wfmt = new Wave.WAVEFORMATEX();
  14.     wfmt.SeekTo(strm);
  15.  
  16.     // Read in the WAVEFORMATEX structure and attempt to open the
  17.     // device for playback.
  18.     wfmt.Read(rdr);
  19.  Debug("Wave file information:");
  20.  Debug("Wave file encoding: " + wfmt.wFormatTag.ToString());
  21.  Debug("Channels: " + wfmt.nChannels.ToString());
  22.  Debug("Bits per Sample: " + wfmt.wBitsPerSample.ToString());
  23.  Debug("Sampling rate: " + wfmt.nSamplesPerSec.ToString());
  24.  
  25.  if ((wfmt.wFormatTag == 1)&&(wfmt.nChannels == 1) && (wfmt.wBitsPerSample == 8) && (wfmt.nSamplesPerSec == 8000))
  26.  {
  27.      uint dataLength = (uint)(rdr.BaseStream.Length - Wave.WAVEFORMATEX.WF_OFFSET_DATA);
  28.      m_whdr = new Wave.WAVEHDR();
  29.      m_whdr.Read(rdr, dataLength, wfmt.nBlockAlign);
  30.   Debug("Wave file data has been read successfully.");
  31.  } else
  32.  {
  33.   Debug("Unsupported wave file.");
  34.   Debug("This example supports only [PCM/Uncompressed], [Mono], [8 bits per sample], [8kHz samples per second] wave files.");
  35.  }
  36.  
  37.     rdr.BaseStream.Close();
  38.     rdr.Close();
  39.     rdr = null;
  40.  
  41.  //Initializing COM port
  42.  m_serialPort = new System.IO.Ports.SerialPort();
  43.  m_serialPort.PortName = "COM10";
  44.  m_serialPort.BaudRate = 115200;
  45.  m_serialPort.DataBits = 8;
  46.  m_serialPort.StopBits = System.IO.Ports.StopBits .One;
  47.  m_serialPort.Parity = System.IO.Ports.Parity.None;
  48.  m_serialPort.Handshake = System.IO.Ports.Handshake.None;
  49.  m_serialPort.DtrEnable = true;
  50.  try
  51.  {
  52.   m_serialPort.Open();
  53.   Debug("COM port is ready.");
  54.  } catch
  55.  {
  56.   Debug("Error opening the COM port.");
  57.  }
  58.  m_talking = false;
  59. }
We have seen this function in Part II. This time we added initialization for the COM port. In our case here: serial COM10 is connected to our external voice modem. Most modems have an auto detect feature for a serial connection. So the settings here: 115200bps, 8 data bits, 1 stop bit, no parity, no handshaking would likely work. However, check the modem's manual to make sure you set them correctly. Hyper Terminal is a handy tool to test your setup approach.

Execute

This function detects if the virtual switch has been turned on. If so, it starts the calling process. Because this function is called every second, we don’t want to monopolize the processing time of the main program by processing the modem connection here. Instead, we spawn another thread to do that job.

Expand|Select|Wrap|Line Numbers
  1. public void Execute()
  2. {   // Required function. DO NOT remove or change the name.
  3.     // Called every second.
  4.  int status = GetStatus("Virtual switch..0.0");
  5.  if ((m_virtualSwitchPreviousState == 0) &&(status > 0))
  6.  {
  7.   if (!m_talking)
  8.   {
  9.    m_talking = true;
  10.    m_phoneThread = new System.Threading.Thread(new System.Threading.ThreadStart(WaveToPhone));
  11.             m_phoneThread.Start();
  12.   }
  13.   m_virtualSwitchPreviousState = status;
  14.  }
  15.  if ((m_virtualSwitchPreviousState > 0) &&(status == 0))
  16.  {
  17.   m_talking = false;
  18.   m_virtualSwitchPreviousState = status;
  19.  }
  20. }
This thread has 3 parts: preparing the modem, sending wave data, and finally terminating the session.
Preparing the modem: These are commands to put the modem in voice mode with our selected voice data encoding format. When a command has been sent to the modem, the script waits for the response. We didn’t check the response here since we know it works (from going through Part I.) We simply print out the debug string. The last steps of this phase is to call the number (“ATDT number”) and switch the modem to the sending voice data mode (“AT+VTX”)

Expand|Select|Wrap|Line Numbers
  1. private void WaveToPhone()
  2. {
  3.  if ((m_serialPort != null)&&(m_serialPort.IsOpen))
  4.  {
  5.   SendCommand("ATZ");
  6.   GetResponse(1, true);
  7.   SendCommand("AT+FCLASS=8");
  8.   GetResponse(1, true);
  9.   SendCommand("AT+VSM=128,8000");
  10.   GetResponse(1, true);
  11.   SendCommand("ATDT" + m_phoneNumber);
  12.   GetResponse(60, false);
  13.  
  14.   SendCommand("AT+VTX");
  15.  
  16.   System.Threading.Thread.Sleep(500);
  17.   SendVoiceData();
  18.   System.Threading.Thread.Sleep(1000);
  19.   SendVoiceData();
  20.   System.Threading.Thread.Sleep(1000);
  21.   SendVoiceData();
  22.   System.Threading.Thread.Sleep(1000);
  23.  
  24.   byte [] terminator = new byte [2];
  25.   terminator[0] = 0x10;
  26.   terminator[1] = 0x03;
  27.   m_serialPort.Write(terminator, 0, 2);
  28.  
  29.   SendCommand("ATH");
  30.  }
  31.  m_talking = false;
  32. }
Sending the wave data.

In the Initialize function, we already read the data into the buffer from a wave file. It’s time now to transfer that buffer as an audio stream to the modem. However, because the modem likely cannot handle large amount of data at once, we send data in chunks of 1024 bytes each. This greatly improves the total throughput ensuring smooth voice on the other end. In our example there, we also repeat the same data buffer 3 times.

Expand|Select|Wrap|Line Numbers
  1. private void SendVoiceData()
  2. {
  3.  if ((m_whdr == null)||(!m_serialPort.IsOpen)) return;
  4.  int offset = 0;
  5.  int blockSize = 1024;
  6.  int dataLength = m_whdr.data.Length;
  7.  if (dataLength == 0) return;
  8.  try
  9.  {
  10.   Debug("Start sending WAVE data...");
  11.   while (true)
  12.   {
  13.    m_serialPort.Write(m_whdr.data, offset, blockSize);
  14.    if (blockSize < 1024) break;
  15.    offset += blockSize;
  16.    if (dataLength - offset < 1024) blockSize = dataLength - offset;
  17.   }
  18.   Debug("Done sending WAVE data.");
  19.  } catch
  20.  {
  21.   Debug("Error sending WAVE data.");
  22.  }
  23. }
Terminating the call.

Once the wave data has been sent, we terminate the connection by: sending 0x10 0x03 to the modem to switch off the sending mode, and then hang up the phone by sending “ATH”. Note that if you plan to have multiple scripts handling different wave files for different events, you will have to close the COM port here as well.