Creating a Built in 4 Score device

Started by lifewithmatthew, September 14, 2021, 11:57:14 am

Previous topic - Next topic

lifewithmatthew

October 04, 2021, 09:58:39 am #30 Last Edit: October 04, 2021, 10:04:28 am by lifewithmatthew
If all I was working with was one controller, then I could probably get away with a single interrupt.  The issue I'm hitting is that the NES pulses the latch command for about 12 microseconds.  To make sure I caught it I would need to have my code execute, ideally for all four controllers, in less than 12 microseconds so that I could load in controller data and not be writing to the variable as it's being read.

So right now toggle the latch for all four controllers in one go, then read the first bit from each controller. then pulse the clock for all four controllers in one go and read the next bit for each controller.

The entire process takes 0.025 milliseconds or 25 microseconds.  Interestingly, the biggest hold up I've faced is the response speed from the 4201 shift register.  I had my code setting the latch in one clock cycle then unsetting it the next and it was too fast for the 4201 to register.  I had to slow it down and use code that set it in 3 cycles and unset in 3 cycles.  Same with the clock pulse, doing it too fast messed things up.

Even going slower I still have to do a double read of the controller bits to get a good read.

I'm figuring my read time by throwing my code into a for loop and having it execute 3000 times and comparing the time before it started to the time after it started.  Over 3000 loops it takes 72,972 microseconds. 

Now some caveats here, there is some overhead in a for loop (around 0.25 microseconds per loop), and there is just shy of 3 microSeconds overhead associated with the command used to return current micros.  But even if I was as generous as possible I can't get the code to read all 4 controllers under 12 microseconds.

One thing I was thinking about is changing the code to read a single controller at a time and do a latch check after each controller read (One controller read takes 10.9 microseconds).  In theory I could just hold the last read values of a controller and everything *should* be okay.  Maybe a controller press comes in 12 microseconds latter than when you pressed it... but that should only really make a difference if you need to do something frame perfect and if you're doing that then I doubt you'd go through a four-score device either.

Alternatively I could just read all the controllers and then have my code wait for the latch command.  Worst case scenario I miss for cycles reading each controller and then let's say I miss the 5th latch by a moment so every 6th latch pulse I send new controller data and the rest of the time I send "stale" data.  That would mean a button press or release could be delayed by up to 75 microseconds.  And suddenly I'm questioning my quest for immediate responses....

P

12 μs for all controllers, sounds tight.

Yeah I don't think you need to worry about re-reading all controllers multiple times within the same frame. Humans won't notice any changes in that short span of time anyway. 2 frames of input lag is surely noticeable, but they say not even world class speedrunners can detect 1 frame of lag and less.

emerson

Part 1/3

Quote from: lifewithmatthew on October 04, 2021, 09:58:39 amTo make sure I caught it I would need to have my code execute, ideally for all four controllers, in less than 12 microseconds so that I could load in controller data and not be writing to the variable as it's being read.

Does this mean you are attempting to read and store the button states of all four controllers between the latch pulse and first clock pulse from the console to the arduino?

Quote from: lifewithmatthew on October 04, 2021, 09:58:39 amSo right now toggle the latch for all four controllers in one go, then read the first bit from each controller. then pulse the clock for all four controllers in one go and read the next bit for each controller.

This implies you have all four controller clock lines tied to a single output from the arduino, is that correct? I also assume you have all four latch pins of the controllers tied together. Are these tied directly to $4016.OUT0 from the console or are they controlled by an arduino pin?

If the answer to both of these is yes then I can see why you are having timing issues. As you said it's simply too much at once. I can see the logic in taking this approach, but let's break down what we know about how the four score is read and use that to form a state machine.

What we know is:
- every time a latch pulse is received the controllers reset
- $4016.CLK pulses 0-7 read cont_1, pulses 8-15 read cont_3, and pulses 16-23 read sig_1
- $4017.CLK pulses 0-7 read cont_2, pulses 8-15 read cont_4, and pulses 16-23 read sig_2
- sig_1 and sig_2 are always the same value (aka constants)
- the idle state for $4016.OUT0 is low, and idle states for $4016.CLK and $4017.CLK are high

We now have everything we need to know about the four score and we can start designing the code to meet these requirements. First let's look at the latch pulse. Because every controller requires the latch pulse simply tie $4016.OUT0 directly to every controller. We also need the latch pulse to reset our state machine variables so tie the latch signal to an input pin on the arduino as well. Because we know nothing can happen until the latch pulse is received by the four score we can use a while loop to wait for it.

Spoiler
//4016.OUT0 is an input pin
//

while (4016.OUT0 == 0);  //loop while latch_in = 0
   4016_cnt = 0;         //reset $4016.CLK counter
   4017_cnt = 0;         //reset $4017.CLK counter
[close]

That's everything the latch pulse needs to do in code. Again, there is some overhead when programming in C but I imagine a 4MHz to 8MHz system clock should certainly execute this within 12us.

emerson

Part 2/3

Now let's look at the clock and data signals as these work in unison. Trying to cram all 24 controller reads between the latch pulse and first clock pulse would be like trying to execute an entire game engine within vblank. Instead, let's utilize the time we are given between clock pulses to handle only one bit from only one controller. That's how the hardware handles it right?

Tie $4016.CLK and $4017.CLK to inputs on the arduino as these are required to increment their respective clock counter variables. These variables are then used to determine which controller is being queried at that instant and send only that controller a clock pulse. We also expect to receive a bit from the controller once it is clocked. Therefore, each controller has a dedicated clock output and data input. Once all this is done we need to increment the respective clock counter for the next loop iteration.

Repeat this approach for both $4016 and $4017

Spoiler
//4016.CLK, CONT1.D0, CONT3.D0 are input pins
//4016.D0, CONT1.CLK, CONT3.CLK are output pins
//

if (4016.CLK == 0){
   switch (4016_cnt){
      case <= 7:              //cont_1
         CONT1.CLK = 0;       //instead of using a loop counter for a few microseconds just repeat the instruction
CONT1.CLK = 0;       //repeat as needed
CONT1.CLK = 1;       //release clock line
CONT1.CLK = 1;       //if necessary allow time for data to stabilize
4016.D0 = CONT1.D0;  //shift current cont1 button out to console
break;
      case <= 15:             //cont_3
         CONT3.CLK = 0;
CONT3.CLK = 0;
CONT3.CLK = 1;
CONT3.CLK = 1;
4016.D0 = CONT3.D0;  //shift current cont3 button out to console
break;
      case <= 23:   //sig_1
     4016.D0 = 0x01 AND ('0b11101111' >> (23 - 4016_cnt));  //shift signature constant x number of bits by
break;                                                 //subtracting the existing count from the total count
      default:
         4016.D0 = 1;   //additional clock pulses always return 1
         break;
   }
   4016_cnt++;   //increment clock counter by 1
}
[close]

I repeated the same command when clocking to allow enough time for the 4021 to return a bit. It looks goofy but when it compiles to assembly the code will execute faster than a for next loop. If available, the best approach would be to use something like "nop();"

emerson

Part 3/3
Currently once the code is past the latch pulse loop it is expecting clock pulses. Because we know the idle states of all three inputs from the console we can (and should) test them all at once. Simply connect $4016.OUT0, $4016.CLK and $4017.CLK to the same input register on the arduino.

Spoiler
//assume PORTA is an arduino I/O register
//
//PORTA = 0bxxxxx110
//               ||+-- $4016.OUT0, idle low
//               |+--- $4016.CLK, idle high
//               +---- $4017.CLK, idle high
//
//all three are input pins
//

FourScoreEngine:
   while (0x06 = PORTA AND 0x07); //loop until any input is received
      switch (PORTA AND 0x07){
         case 0b00000111:  //latch pulse detected
            4016_cnt = 0;  //reset $4016.CLK counter
            4017_cnt = 0;  //reset $4017.CLK counter
        break;
       
         case 0b00000101;           //4016.CLK pulse detected
            switch (4016_cnt){
            case <= 7:              //cont_1
               CONT1.CLK = 0;       //instead of using a loop counter for a few microseconds just repeat the instruction
      CONT1.CLK = 0;       //repeat as needed
      CONT1.CLK = 1;       //release clock line
      CONT1.CLK = 1;       //if necessary allow time for data to stabilize
      4016.D0 = CONT1.D0;  //shift current cont1 button out to console
      break;
            case <= 15:             //cont_3
               CONT3.CLK = 0;
      CONT3.CLK = 0;
      CONT3.CLK = 1;
      CONT3.CLK = 1;
      4016.D0 = CONT3.D0;  //shift current cont3 button out to console
      break;
            case <= 23:   //sig_1
           4016.D0 = 0x01 AND ('0b11101111' >> (23 - 4016_cnt));  //shift signature constant x number of bits by
      break;                                                 //subtracting the existing count from the total count
            default:
               4016.D0 = 1;   //additional clock pulses always return 1
               break;
            }
            4016_cnt++;   //increment clock counter by 1
        break;
       
         case 0b00000011;           //4017.CLK pulse detected
            switch (4016_cnt){
            case <= 7:              //cont_1
               CONT2.CLK = 0;       //instead of using a loop counter for a few microseconds just repeat the instruction
      CONT2.CLK = 0;       //repeat as needed
      CONT2.CLK = 1;       //release clock line
      CONT2.CLK = 1;       //if necessary allow time for data to stabilize
      4017.D0 = CONT2.D0;  //shift current cont1 button out to console
      break;
            case <= 15:             //cont_3
               CONT4.CLK = 0;
      CONT4.CLK = 0;
      CONT4.CLK = 1;
      CONT4.CLK = 1;
      4017.D0 = CONT4.D0;  //shift current cont3 button out to console
      break;
            case <= 23:   //sig_1
           4017.D0 = 0x01 AND ('0b11110111' >> (23 - 4017_cnt));  //shift signature constant x number of bits by
      break;                                                 //subtracting the existing count from the total count
            default:
               4017.D0 = 1;   //additional clock pulses always return 1
               break;
            }
            4017_cnt++;   //increment clock counter by 1
        break;
       
         default: break;  //do nothing when idle
      }
   goto FourScoreEngine;  //loop forever
[close]

A total of 2 variables and 13 I/O pins are required for this solution (1 latch input, 2 clock inputs, 4 clock outputs, 4 data inputs, 2 data outputs).

lifewithmatthew

Quote from: emerson on October 04, 2021, 05:09:29 pmPart 1/3

Quote from: lifewithmatthew on October 04, 2021, 09:58:39 amTo make sure I caught it I would need to have my code execute, ideally for all four controllers, in less than 12 microseconds so that I could load in controller data and not be writing to the variable as it's being read.

Does this mean you are attempting to read and store the button states of all four controllers between the latch pulse and first clock pulse from the console to the arduino?


Oh my no, lol.  I did word that poorly, I was saying if I only used one interrupt I would be worried about needing to execute all my code for the controller reads in less than 12 ms. 
 
Quote from: emerson on October 04, 2021, 05:09:29 pm
Quote from: lifewithmatthew on October 04, 2021, 09:58:39 amSo right now toggle the latch for all four controllers in one go, then read the first bit from each controller. then pulse the clock for all four controllers in one go and read the next bit for each controller.

This implies you have all four controller clock lines tied to a single output from the arduino, is that correct? I also assume you have all four latch pins of the controllers tied together. Are these tied directly to $4016.OUT0 from the console or are they controlled by an arduino pin?

Yes, I have all four controller clock lines tied to a single output on the arduino and all four latch pins tied to a different pin.  The four score ties all the latch pins together, but has dedicated clock pins for each controller.  My current method for reading it was to send the latch, then read the PORTC, then send clock pulse and read PORTC, rinse and repeat until I had all 8 bits.  Since all my data lines are on PORTC, I can read all four controllers in one go and then write out the appropriate bits to my controller variables.

However, I think you came up with a very interesting code concept.  I have some minor concerns about getting the timing down for sending my latch and clock pulses based on my experience so far, but it's definitely worth the go!  I'll try and test it out today if possible :)

emerson

If you think about it, the controllers already contain the data you need so there is no reason to store that data in variables. This will save lots of time. The console only expects one bit from whichever controller port it is querying so that is all you really need to provide within one clock pulse period.

The main idea is not to assume what signal is coming next but rather structure your code to handle any signal at any program loop iteration.

I forget if you said before but what clock speed are you running your microcontroller at? This is your biggest bottleneck.

P

Quote from: lifewithmatthew on October 05, 2021, 08:29:50 amYes, I have all four controller clock lines tied to a single output on the arduino and all four latch pins tied to a different pin.  The four score ties all the latch pins together, but has dedicated clock pins for each controller.
There is only a single latch signal coming from the console so having them separate wouldn't do anything. There are two different clocks though, $4016.CLK and $4017.CLK.

lifewithmatthew

Quote from: P on October 05, 2021, 10:49:55 am
Quote from: lifewithmatthew on October 05, 2021, 08:29:50 amYes, I have all four controller clock lines tied to a single output on the arduino and all four latch pins tied to a different pin.  The four score ties all the latch pins together, but has dedicated clock pins for each controller.
There is only a single latch signal coming from the console so having them separate wouldn't do anything. There are two different clocks though, $4016.CLK and $4017.CLK.

I know, and I use the two different clocks to handle writing data to the NES, but in terms of me first reading that data, to store it, it doesn't necessarily matter if I split it up or use a single clock pulse and then read the bits into my variables.

So I tested out the code last night and just as I was concerned, the timing doesn't work out.  In fact, after I tried it for a bit, I tried reducing things down to the bare minimum where I used a single controller and made my latch out equal the latch input from the NES, my clock output equal the clock input from the NES and made my data output equal the data input from the controller and it still failed.


Cont1.Latch = PORTA & 0x01
Cont1.CLK = PORTA & 0x02
4016.D0 = Cont1.D0


The timing is the hold up.  By the time I pulse the clock the NES has already read the stale value being written out even though it was a mere 6ish clock cycles later at most.  (My microcontroller is running at 16 MHz). 

I think you still have a good idea here by minimizing the code I think I can find the happy medium of pre-reading the the controller data and getting the status of the latch without having to dedicate an interrupt to it.  Then I can use the rising edge of the interrupt to write the next value such that at the falling edge the NES reads the correct value and I preload the next variable to be written.

And part of that optimization will be giving each controller its own clock pulse.  I had started off that way, but condensed it down by the time I got to the code that works.  And I do have it working, I just think it should be feasible to do it with a single microcontroller if I did things more cleanly.

emerson

For those three lines of code, if you manually assert the latch and clock inputs does the data output work as expected? 16MHz is plenty fast enough and possibly even overkill. Assuming one clock per instruction that's 62.5ns per instruction. Assuming each of those three lines compiles into two lines of assembly code that would still only be (2 x 3) * (62.5*10-9) = 375ns

This could possibly be a setup configuration issue and not a timing issue.

When things fail this poorly (no offense) I start from the beginning. Check your configuration fuses and initialization code. Make sure the chip is configured exactly as needed. Unless you need them disable things like watchdog timers and make sure your clock source is set to internal or external and at the appropriate rate. Also make sure whatever programming software you're using is pulling the configuration fuses from your hex file and not the software defaults. The software should allow you to read the config fuses off the chip so if you can test that to make sure they are correct.

In my code I like to force all unused peripheral features to their disabled condition regardless of their default power-up state. I will scroll through the entire datasheet and disable what I don't need in the order that it appears in the datasheet. Then below that I enable the peripherals I need. You only need basic I/O function so all peripherals should be disabled. Be sure to disable all interrupts as well.

Once you believe the chip is properly configured put a small loop just after your config setup. Have this loop toggle the state of an output pin and test with an oscilloscope. If you see a square wave with the expected period then you know the chip is alive. It will not be the same as your clock speed. At this point you can test all I/O pins if you like.
while (1){
   PORTA.D0 = PORTA.D0 XOR 0x01;
}

Once you are confident the chip is properly initialized then start debugging the code. I don't have a fancy in circuit debugger so I take an unused I/O pin and set it as an output with an LED. Then I will insert a line of code that lights the LED and halts in a loop to test if the code executed that far. If the LED doesn't light up then you know the problem lies just before that loop. This simple trick has helped me debug countless projects.

lifewithmatthew

Quote from: emerson on October 06, 2021, 03:58:06 pmFor those three lines of code, if you manually assert the latch and clock inputs does the data output work as expected?

So I think I may have done some miscommunication at some point here.  I have a version of the code that 100% works (I've tested it against Monopoly and RC PRO AM II).  The reason I tried changing things up is I think you had a *very* cleaver idea to get is scaled down to one microcontroller instead of using 2 microcontrollers like I currently have to.  I can confirm that I can easily read the controller data into the microcontroller with zero issues.  I can also confirm that I have learned the correct way to get that data back out of it to the NES.

The issue with trying to fully implement the code you suggested is with getting the timing worked out with the NES.  The moment the controller clock goes low, the NES has already read the value and moved on in its code.  Even if I got it out 62.2 ns later, it's still too late as far as the NES is concerned.  As a result, I have to have the data the NES is wanting to see queued up and ready to go on the rising edge of the clock pulse.

Now maybe there could be a way to always be one clock pulse ahead of the NES.  Maybe if on the rising edge of the latch pulse from the NES I sent the latch pulse to the controller, and read the first bit (the "A" button status) and output that status onto the data line then read the next bit such that on the rising edge of the clock pulse from the NES I could load in the next bit (the "B" button status), but that's creating a lot of work to do on the rising edge of the latch and I don't see it as strictly gaining any usefulness compared to how I have it working currently.

In terms of using one microcontroller with 3 interrupts, I think that's an issue with the ATMEGA 4809 I've been testing with.  It says that any pin can be an interrupt, but I suspect that it's not using true hardware interrupts.  When I use the exact same code that works for one pair of controllers and combine it with all 4 controllers, everything just goes haywire.  And I'm keeping the interrupts very small.  At most the interrupt take 10 instructions to finish, that is short enough to easily execute before the next interrupt would fire. 

I think that if I used a different microcontroller I might be able to get away with a single microcontroller.  I thought about trying this out with an ESP32, but it operates at 3v3 and I would have had to buy some additional level shifters to see if it *might* work, so I decided to pass on that for now.

For my testing methods I've gone about this with three boards, all of them were ATMEGAs the 328, 5206, and 4809. As I've jumped around between the three of them with different ideas I've started from the ground up many times, but my testing remains more or less the same.

I confirm that I'm getting accurate data from the controller to the microcontroller.  I confirm that I can send out a specific set of controller presses by hard coding my output with a given number (254 means A is pressed, 253 means B is pressed, 252 is A and B, etc) and that those presses are correctly read by the NES using the excellent testing program provided (seriously thank you, P, without that program this project would have been dead before it started). Then I combine it to get the controller to the NES debugging as necessary.

From this testing and looking up information online from people who have tested the timings with oscilloscopes, I've made the following observation the following about the NES reading process and how it relates to the four score.  I've tested this by testing when interrupts occur, counting how many pulses happen between latches, and testing what combinations actually occur between the latch and clock pins.

  • The NES polls the controllers at a rate of 60 Hz (or about once per 16 ms)
  • The NES uses an ~83 kHz clock to send it's latch and clock pulses (83 kHz is about 12 microseconds
  • Each poll cycle begins with setting the latch pin high for the full duty cycle of the 83 kHz clock (~12 microseconds), beyond that time latch is low.
  • The clock pulses use an alternating half duty cycle of the 83 kHz clock.  That is to say that the clock is high for 6 microseconds then low for the next 6 microseconds
  • For two players (where only one player played at a time) the NES would send all 8 clock pulses for player 1 followed by all 8 pulses for player 2
  • For games where both players were on the screen at the same time (or for the multitap) it was possible to send the signal interlaced, where controller 1 port clock's would be high when controller 2 port was low and alternate back and forth.
  • Immediately after the latch transitions from high to low, controller 1 sets its clock pin high for 6 microsecond then low for 6 microseconds.
  • As soon as the clock pin is low the data is read from the controller
  • If it's a 2 player game, it will repeat this 8 times before sending the latch again (or 8 pulses for each controller depending on the type of game)
  • If it's a 4 player game, it will repeat this 24 times before sending the latch again (8 times for controllers 1 and 2, 8 times for controllers 3 and 4, and 8 times for the two signatures)
  • Since the NES alternates clock reads between controller 1 and controller 2 ports (see caveats above), 8 or 24 clock duty cycles is all that is required to read 2 or 4 controllers.
  • From an electronic signal perspective the signature for the four score is 0b11110111 for player 1/3 and 0b11111011 for player 2/4
  • When the clock pulse goes high, the next bit to be read is advanced in the controller's 4201 shift register and when it goes low the NES reads the value off the 4201 shift register's serial out pin.

Hopefully this information helps anyone else who wants to DIY a solution.  I still plan on implementing a turbo feature as well as using a switch circuit to direct player's 3/4 data to 4016.D1 / 4017.D1, but that's going to have to be tested later on.

emerson

Quote from: lifewithmatthew on October 07, 2021, 11:24:47 amSo I think I may have done some miscommunication at some point here.

No this was my bad. After my post from yesterday I re-read yours and realized I did not fully read what you wrote. My apologies.

Quote from: lifewithmatthew on October 07, 2021, 11:24:47 amThe issue with trying to fully implement the code you suggested is with getting the timing worked out with the NES.  The moment the controller clock goes low, the NES has already read the value and moved on in its code.  Even if I got it out 62.2 ns later, it's still too late as far as the NES is concerned.  As a result, I have to have the data the NES is wanting to see queued up and ready to go on the rising edge of the clock pulse.

The latch portion was not handled correctly in my sample code. As soon as the 4021 receives the latch pulse it loads Q7 with the data on D7. Once the 4021 receives the first clock pulse it pushes D6 out to Q7, and the process repeats for 8 clocks total. My latch code should look like this.

case 0b00000111:  //latch pulse detected
   4016_cnt = 0;  //reset $4016.CLK counter
   4017_cnt = 0;  //reset $4017.CLK counter
   //possible delays needed here
   4016.D0 = CONT1.D0;
   4017.D0 = CONT2.D0;
break;

I disagree with some of the statements on the clock timings. The below image is of an oscilloscope capture of single clock pulse being sent to the controller. Note how small the low portion of the duty cycle is. It's more like 5% low and not 50%. Also note the frequency is about 64kHz and not 83kHz.

You cannot see attachments on this board.

Here is the same setup but with controller 1 connected through my four score in 4 player mode (still SMB1). The waveform is different but no significant difference in timing.

You cannot see attachments on this board.

If you would like me to do some hardware testing and get scope captures for you I would be happy to do what I can.

lifewithmatthew

October 07, 2021, 09:29:48 pm #42 Last Edit: October 08, 2021, 08:16:32 am by lifewithmatthew
Quote from: emerson on October 07, 2021, 04:27:49 pm
Quote from: lifewithmatthew on October 07, 2021, 11:24:47 amSo I think I may have done some miscommunication at some point here.

No this was my bad. After my post from yesterday I re-read yours and realized I did not fully read what you wrote. My apologies.

No need to apologize, you've been extremely helpful :D

Quote from: emerson on October 07, 2021, 04:27:49 pmThe latch portion was not handled correctly in my sample code. As soon as the 4021 receives the latch pulse it loads Q7 with the data on D7. Once the 4021 receives the first clock pulse it pushes D6 out to Q7, and the process repeats for 8 clocks total. My latch code should look like this.

case 0b00000111:  //latch pulse detected
   4016_cnt = 0;  //reset $4016.CLK counter
   4017_cnt = 0;  //reset $4017.CLK counter
   //possible delays needed here
   4016.D0 = CONT1.D0;
   4017.D0 = CONT2.D0;
break;

Yeah, I had caught that and fixed it during testing.  The way you implemented a switch was extremely helpful in helping me figure out what conditions exist and never occur with the latch and clock pulses.  Specifically if we assume controller 1 clock, controller 2 clock, and latch are the lower 3 bits the following conditions are true:

  • 111: Clock 2, Clock 1 idle, latch pulse high (beginning of controller read cycle)
  • 110: Clock 2, Clock 1, and Latch all idle (True during controller read cycle between controller bit reads for either controller)
  • 101: Clock 2 idle, Clock 1 Pulse high, Latch high. DOES NOT OCCUR
  • 100: Clock 2 idle, Clock 1 Pulse Low, Latch idle. (NES reading Controller 1 D0 Input)
  • 011: Clock 2 Pulse low, Clock 1 idle, Latch pulse high. DOES NOT OCCUR
  • 010: Clock 2 Pulse low, Clock 1 and latch idle. (NES reading Controller 2 D0 Input)
  • 001: Clock 2 Pulse low, Clock 1 Pulse Low, Latch Idle. DOES NOT OCCUR
  • 000: Clock 2 Pulse low, Clock 1 Pulse low, Latch Idle. DOES NOT OCCUR

Quote from: emerson on October 07, 2021, 04:27:49 pmI disagree with some of the statements on the clock timings. The below image is of an oscilloscope capture of single clock pulse being sent to the controller.

Very interesting.  I personally do not have an oscilloscope.  I relied on the following two pages for others who have used the scopes to examine pulse timings.
https://hackaday.io/project/170365-blueretro/log/181368-famicom-nes-controller-shift-register-parallel-in-serial-out
https://forums.nesdev.org/viewtopic.php?p=239781&sid=76e116fe248bf4a8458aacc4bf9870b2#p239781

I have no idea why the pulses are different.

I do like how many ways there are to replicate this though.  I do think that a part of the timing issue has to do with some quirks with the ATmega chips and would be interested in how a PIC would handle it.  While my personal belief is the more data the better, I've reached a solutions that I am... begrudgingly satisfied with for the moment.  I could try some more stuff with the ATMega5206.  It definitely has enough confirmed hardware interrupts, but one 5206 costs around 9 bucks whereas a 328 costs a buck fifty to 2 dollars.  Even if you count the extra cost from some power filtering components it's still a lot cheaper to just use two 328 chips that work.

I'm sure some day I'll break down and get some level shifters to test out with the ESP32.  That thing runs faster AND has 2 cores to work with... I might go to amazon right now, lol.  I wonder if the 4201 shift registers could run off 3v3....

emerson

Quote from: lifewithmatthew on October 07, 2021, 09:29:48 pmI do think that a part of the timing issue has to do with some quirks with the ATmega chips and would be interested in how a PIC would handle it.

So I took the time last evening and tried this for myself. Would you believe me if I said it didn't work, haha! The problem I am seeing is the loop that polls the three inputs. This must compile to some length of code that executes longer than the ~580ns low period of the clock pulse. I also found that when the clock is recognized, my switch case approach will finish executing before the next clock is received!

Here is a scope capture and the code modification to allow me to measure code execution time. Note how the pink pulses fit within the clock duration but are only triggered every other iteration. The extra pulses on D0 are from SMB1 polling $4017.
FourScoreEngine:
    D0_4016 = 0;                    //output is cleared at loop restart
    while (0x06 == (PORTB & 0x06)); //loop until any input is received
    D0_4016 = 1;                    //output is set for duration of code execution
    switch (PORTB & 0x07){
    //rest of the code as usual from here

You cannot see attachments on this board.

I found this interesting. I fired up my PowerPak and the boot screen polls the controllers at a rate of ~90kHz. Fortunately the low duty timing remains around ~580ns and only the high duty time changes. The scope capture below is one of the better responses from my circuit in terms of trigger quantity, and the code still executes fast enough at 90kHz.

You cannot see attachments on this board.

I will agree at this point that if programming in C, interrupts are probably required. I wrote up a potential solution in assembly during lunchtime at work today. The entire input polling loop takes only 437.5ns at 16MHz. Theoretically a clock pulse should never be missed at this rate. I did not count clocks for every possible execution route but I did label the required cycles. The biggest problem I see the lack of shadow registers. Below is my assembly code, written in mpasm.

Spoiler
;wreg is the working register, just like an accumulator
;
;clock period is:
; 1/16,000,000 = 62.5ns
;
;Loop movf to movf is:
; 1+1+1+1+2+1 = 7 cycles
;
;Loop execution time is:
; 7*62.5ns = 437.5ns

cblock 0x20 ;declare variables
x
cnt_4016
cnt_4017
sig1
sig2
endc

Loop: ;cycles
movf PORTB,0 ;1 load PORTB into wreg
andlw 0x07,0 ;1 isolate latch and clock inputs, store in wreg
xorlw 0x06,0 ;1 if inputs are idle then result is zero
btfsc STATUS,Z ;1,2 test zero flag and skip if an input is not idle
goto Loop ;2 repeat until an input changes

;if WREG can be used instead of x then code can be simplified

movwf x ;1 save the non-idle state of inputs
rrf x,1 ;1 rotate latch bit into carry, store remainder in x
btfsc STATUS,C ;1,2 test if latch bit is high
goto LatchHandler ;2 if yes then handle latch pulse
rrf x,1 ;1 rtoate 4016 clock into carry, store remainder in x
btfsc STATUS,C ;1,2 skip if 4016 clock is active
goto Cont24Handler ;2 run if carry is set
goto Cont13Handler ;2 run if carry is clear

LatchHandler:
movf D0_Cont1,0 ;1 shift cont1 d7 to console
movwf D0_4016 ;1
movf D0_Cont2,0 ;1 shift cont2 d7 to console
movwf D0_4017 ;1
clrf cnt_4016 ;1 clear controller clock counters
clrf cnt_4017 ;1
movlw 0xf7 ;1 reset sig1 0b11110111
movwf sig1 ;1
movlw 0xfb ;1 reset sig2 0b11111011
movwf sig2 ;1
goto Loop ;2

Cont13Handler:
movf cnt_4016,0 ;1 load clock counter
andlw 0x18,0 ;1 isolate count bits 8 and 16
btfsc STATUS,Z ;1 if less than 8 clocks have occured
goto ReadCont1 ;2 then read cont1
xorlw 0x18,0 ;1 toggle remaining bits
btfsc STATUS,Z ;1,2 if the result is zero then everything has already been sent
goto NullData ;2 so send null data
andlw 0x08,0 ;1 test the remaining bits, but remember they are now inverted
btfsc STATUS,Z ;1,2 if the result is zero, less than 16 clocks have occured so read cont3
goto ReadSig1 ;2 otherwise read signature

;continue to cont3 code below

bcf clk_cont3 ;1 send clock pulse to controller
nop ;1 and allow some time for controller to process
bsf clk_cont3 ;1
movf D0_Cont3 ;1 shift controller data to console
movwf D0_4016 ;1
incf cnt_4016 ;1 increment controller counter
goto Loop ;2

ReadCont1:
bcf clk_cont1 ;1 send clock pulse to controller
nop ;1 and allow some time for controller to process
bsf clk_cont1 ;1
movf D0_Cont1 ;1 shift controller data to console
movwf D0_4016 ;1
incf cnt_4016 ;1 increment controller counter
goto Loop ;2

NullData13:
setf D0_4016 ;1 this might not be the correct way to do this
goto Loop ;2

ReadSig1:
movf sig1,0 ;1 mov sig1 into wreg
andlw 0x01 ;1 isolate LSB
movwf D0_4016 ;1 shift data to console
incf cnt_4016 ;1 increment clock counter
goto Loop ;2



Cont24Handler:
movf cnt_4017,0 ;1 load clock counter
andlw 0x18,0 ;1 isolate count bits 8 and 16
btfsc STATUS,Z ;1 if less than 8 clocks have occured
goto ReadCont2 ;2 then read cont1
xorlw 0x18,0 ;1 toggle remaining bits
btfsc STATUS,Z ;1,2 if the result is zero then everything has already been sent
goto NullData ;2 so send null data
andlw 0x08,0 ;1 test the remaining bits, but remember they are now inverted
btfsc STATUS,Z ;1,2 if the result is zero, less than 16 clocks have occured so read cont3
goto ReadSig2 ;2 otherwise read signature

;continue to cont4 code below

bcf clk_cont4 ;1 send clock pulse to controller
nop ;1 and allow some time for controller to process
bsf clk_cont4 ;1
movf D0_Cont4 ;1 shift controller data to console
movwf D0_4017 ;1
incf cnt_4017 ;1 increment controller counter
goto Loop ;2

ReadCont2:
bcf clk_cont2 ;1 send clock pulse to controller
nop ;1 and allow some time for controller to process
bsf clk_cont2 ;1
movf D0_Cont2 ;1 shift controller data to console
movwf D0_4017 ;1
incf cnt_4017 ;1 increment controller counter
goto Loop ;2

NullData24:
setf D0_4017 ;1 this might not be the correct way to do this
goto Loop ;2

ReadSig2:
movf sig2,0 ;1 mov sig1 into wreg
andlw 0x01 ;1 isolate LSB
movwf D0_4017 ;1 shift data to console
incf cnt_4017 ;1 increment clock counter
goto Loop ;2
[close]

I may test this over the weekend after I tinker with my (many) other projects.

P

October 08, 2021, 05:28:12 pm #44 Last Edit: October 08, 2021, 05:37:32 pm by P
The 4000-series are CMOS so they are low-power and have a wide voltage range (something like 3-15 V I think).
It's nice to see that you got it to work!


Quote from: lifewithmatthew on October 07, 2021, 11:24:47 amI confirm that I'm getting accurate data from the controller to the microcontroller.  I confirm that I can send out a specific set of controller presses by hard coding my output with a given number (254 means A is pressed, 253 means B is pressed, 252 is A and B, etc) and that those presses are correctly read by the NES using the excellent testing program provided (seriously thank you, P, without that program this project would have been dead before it started).
I'm just glad that my little experiments came to use. It should be said that my program has a little bug that I was unable to locate: if B is pressed the SELECT indicator on the same controller won't update for some reason.


Oh and by the way about alternating clock pulses, this of course depends on the game. A game only using one controller might only clock $4016. It would be a silly way to read more controllers but of course it is possible to read all controllers without alternating the pulses by latching multiple times (my program actually latches twice per frame, once for the signature and once for the buttons, it's not very effective but my program is way too small to cause any slowdown so I didn't really worry about that). Not sure if that matters for this multitap project, I just want to make it clear.