100% found this document useful (2 votes)
770 views18 pages

Operant Conditioning

Operant conditioning is a form of learning where voluntary behaviors are modified by consequences. It deals with behaviors that operate on the environment and are influenced by reinforcement or punishment. There are four types of operant conditioning procedures: positive reinforcement which increases behavior by adding a rewarding stimulus; negative reinforcement which increases behavior by removing an aversive stimulus; positive punishment which decreases behavior by adding an aversive stimulus; and extinction which decreases behavior by withholding reinforcement. Schedules of reinforcement such as fixed ratio and variable ratio influence how often behaviors are rewarded or punished.

Uploaded by

kariverma
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
770 views18 pages

Operant Conditioning

Operant conditioning is a form of learning where voluntary behaviors are modified by consequences. It deals with behaviors that operate on the environment and are influenced by reinforcement or punishment. There are four types of operant conditioning procedures: positive reinforcement which increases behavior by adding a rewarding stimulus; negative reinforcement which increases behavior by removing an aversive stimulus; positive punishment which decreases behavior by adding an aversive stimulus; and extinction which decreases behavior by withholding reinforcement. Schedules of reinforcement such as fixed ratio and variable ratio influence how often behaviors are rewarded or punished.

Uploaded by

kariverma
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Operant conditioning

Operant conditioning is a form of psychological learning where an individual modifies the occurrence and form of its own behavior due to the association of the behavior with a stimulus. Operant conditioning is distinguished from classical conditioning (also called respondent conditioning) in that operant conditioning deals with the modification of "voluntary behavior" or operant behavior. Operant behavior "operates" on the environment and is maintained by its consequences, while classical conditioning deals with the conditioning of reflexive (reflex) behaviors which are elicited by antecedent conditions. Behaviors conditioned via a classical conditioning procedure are not maintained by consequences.[1]

Reinforcement, punishment, and extinction


Reinforcement and punishment, the core tools of operant conditioning, are either positive (delivered following a response), or negative (withdrawn following a response). This creates a total of four basic consequences, with the addition of a fifth procedure known as extinction (i.e. no change in consequences following a response). It is important to note that actors are not spoken of as being reinforced, punished, or extinguished; it is the actions that are reinforced, punished, or extinguished. Additionally, reinforcement, punishment, and extinction are not terms whose use is restricted to the laboratory. Naturally occurring consequences can also be said to reinforce, punish, or extinguish behavior and are not always delivered by people.
y y y

Reinforcement is a consequence that causes a behavior to occur with greater frequency. Punishment is a consequence that causes a behavior to occur with less frequency. Extinction is the lack of any consequence following a behavior. When a behavior is inconsequential (i.e., producing neither favorable nor unfavorable consequences) it will occur with less frequency. When a previously reinforced behavior is no longer reinforced with either positive or negative reinforcement, it leads to a decline in the response.

Four contexts of operant conditioning


Here the terms positive and negative are not used in their popular sense, but rather: positive refers to addition, and negative refers to subtraction. What is added or subtracted may be either reinforcement or punishment. Hence positive punishment is sometimes a confusing term, as it denotes the "addition" of a stimulus or increase in the intensity of a stimulus that is aversive (such as spanking or an electric shock). The four procedures are: 1. Positive reinforcement (Reinforcement): occurs when a behavior (response) is followed by a stimulus that is appetitive or rewarding, increasing the frequency of that behavior. In

the Skinner box experiment, a stimulus such as food or sugar solution can be delivered when the rat engages in a target behavior, such as pressing a lever. 2. Negative reinforcement (Escape): occurs when a behavior (response) is followed by the removal of an aversive stimulus, thereby increasing that behavior's frequency. In the Skinner box experiment, negative reinforcement can be a loud noise continuously sounding inside the rat's cage until it engages in the target behavior, such as pressing a lever, upon which the loud noise is removed. Positive Reinforcement 3. This is possibly the easiest, most effective consequence for a trainer to control (and easy to understand, too!). Positive reinforcement means starting or adding Something Good, something the animal likes or enjoys. Because the animal wants to gain that Good Thing again, it will repeat the behavior that seems to cause that consequence. 4. Examples of positive reinforcement: 5. The dolphin gets a fish for doing a trick. The worker gets a paycheck for working. The dog gets a piece of liver for returning when called. The cat gets comfort for sleeping on the bed. The wolf gets a meal for hunting the deer. The child gets dessert for eating her vegetables. The dog gets attention from his people when he barks. The elephant seal gets a chance to mate for fighting off rivals. The child gets ice cream for begging incessantly. The toddler gets picked up and comforted for screaming. The dog gets to play in the park for pulling her owner there. The snacker gets a candy bar for putting money in the machine. Secondary positive reinforcers and Bridges 6. A primary positive reinforcer is something that the animal does not have to learn to like. It comes naturally, no experience necessary. Primary R+s usually include food, water, often include sex (the chance to mate), the chance to engage in instinctive behaviors, and for social animals, the chance to interact with others. 7. A secondary positive reinforcer is something that the animal has to learn to like. The learning can be accomplished through Classical Conditioning or through some other method. A paycheck is a secondary reinforcer - just try writing a check to reward a young child for potty training! 8. Animal trainers will often create a special secondary reinforcer they call a bridge. A bridge is a stimulus that has been associated with a primary reinforcer through classical conditioning. This process creates a conditioned positive reinforcer, often called a conditioned reinforcer or CR for short. Animals that have learned a bridge react to it almost as they would to the reward that follows (animals that have learned what clicker training is all about may sometimes prefer the CR that tells them they got it right over the actual "reward").

Schedules of Reinforcement, and Extinction

9. A schedule of reinforcement determines how often a behavior is going to result in a reward. There are five kinds: fixed interval, variable interval, fixed ratio, variable ratio, and random. 10. A fixed interval means that a reward will occur after a fixed amount of time. For example, every five minutes. Paychecks work on this schedule - every two weeks I got one. 11. A variable interval schedule means that reinforcers will be distributed after a varying amount of time. Sometimes it will be five minutes, sometimes three, sometimes seven, sometimes one. My e-mail account works on this system - at varying intervals I get new mail (for me, email is generally a Good Thing!). 12. A fixed ratio means that if a behavior is performed X number of times, there will be one reinforcement on the Xth performance. For a fixed ratio of 1:3, every third behavior will be rewarded. This type of ratio tends to lead to lousy performance with some animals and people, since they know that the first two performances will not be rewarded, and the third one will be no matter what. Some assembly-line production systems work on this schedule - the worker gets paid for every 10 widgets she makes. A fixed ratio of 1:1 means that every correct performance of a behavior will be rewarded. 13. A variable ratio schedule means that reinforcers are distributed based on the average number of correct behaviors. A variable ratio of 1:3 means that on average, one out of every three behaviors will be rewarded. It might be the first. It might be the third. It might even be the fourth, as long as it averages out to one in three This is often referred to as a variable schedule of reinforcement or VSR (in other words, it's often assumed that when someone writes "VSR" they are referring to a variable ratio schedule of reinforcement). 14. With a random schedule, there is no correlation between the animal's behavior and the consequence. This is how Fate works. 15. If reinforcement fails to occur after a behavior that has been reinforced in the past, the behavior might extinguish. This process is called extinction. A variable ratio schedule of reinforcement makes the behavior less vulnerable to extinction. If you're not expecting to gain a reward every time you accomplish a behavior, you are not likely to stop the first few times your action fails to generate the desired consequence. This is the principle that slot machines are based on. "OK, I didn't win this time, but next time I'm almost sure to win!" 16. When a behavior that has been strongly reinforced in the past no longer gains a reinforcement, you might experience what's call an extinction burst. This is when the animal performs the behavior over and over again, in a burst of activity. Extinction bursts are something for trainers to watch out for! 17. See some nice graphs of various schedules here (but skip the table of "Outcomes of Conditioning" - it's misleading. This author uses "positive" to mean both "added" and "nice" - confusing!) 18. Recently Bob Bailey has cautioned against needlessly using variable schedules. Most useful behaviors, he points out, will get some sort of reinforcement every time. You might not always click and treat your dog for sitting on cue, but you will always reward it with some recognition and praise ("Good dog!"). If there is some circumstances where you will be unable to deliver any reinforcement (during a long sequence of behaviors, or when the animal is out of contact), then you will need to build a buffer against extinction with a VSR. Otherwise, don't bother.

Negative Reinforcement 19. Negative reinforcement increases a behavior by ending or taking away Something Bad or aversive. By making the animal's circumstances better, you are rewarding it and increasing the likelihood that it will repeat the behavior that was occurring when you ended the Bad Thing. 20. In order to use negative reinforcement, the trainer must be able to control the Bad Thing that is being taken away. This often means that the trainer must also apply the Bad Thing. And applying a Bad Thing might reduce whatever behavior was going on when the Bad Thing was applied. And reducing a behavior by applying a Bad Thing is positive punishment. So when you start your Bad Thing that you're going to end as a negative reinforcer, you run the risk of punishing some other behavior. 21. One of the major results of taking away Something Bad is often relief. So another way to think of negative reinforcement is that you are providing relief to the animal but of course, this makes it an example of positive reinforcement - you are providing Something Good - relief. Confusing? Examples 22. The choke collar is loosened when the dog moves closer to the trainer. The ear pinch stops when the dog takes the dumbbell. The reins are loosened when the horse slows down. The car buzzer turns off when you put on your seatbelt. Dad continues driving towards Disneyland when the kids are quiet. "I'm not talking to you until you apologize!" The hostage is released when the ransom is paid. The torture is stopped when the victim confesses. "Why do I keep hitting my head against the wall? 'Cause it feels so good when I stop!" The baby stops crying when his mom feeds him. Secondary Negative Reinforcers 23. Trainers seldom go to the trouble of associating a particular cue with negative reinforcement. You can still go ahead and do it.

Cautions in using positive reinforcement If the animal is acting out of fear, you may be rewarding the fear response. This can happen when you coddle a shy dog. The timing must be good. If the animal did a great "stay" and you reward after the release, you are rewarding getting up. The reward has to be sufficient to motivate a repetition. Mild praise won't be enough for some animals. Others require the richest of food rewards, etc.

Reinforcements can become associated with the person giving them. If the animal realizes that he can't get any rewards without you present, he will not be motivated to act. Animals can get sated with the reward you're offering when they've had enough, and it will no longer be motivating. Reinforcers increase behavior. If you don't want your animal actively trying out new behaviors ("throwing behaviors at the trainer"), don't use positive reinforcement. Use a positive reinforcement to train an animal to do something.

24. Positive punishment (Punishment) (also called "Punishment by contingent stimulation"): occurs when a behavior (response) is followed by a stimulus, such as introducing a shock or loud noise, resulting in a decrease in that behavior. 25. Negative punishment (Penalty) (also called "Punishment by contingent withdrawal"): occurs when a behavior (response) is followed by the removal of a stimulus, such as taking away a child's toy following an undesired behavior, resulting in a decrease in that behavior . Negative Punishment 26. Negative punishment is reducing behavior by taking away Something Good. If the animal was enjoying or depending on Something Good she will work to avoid it getting taken away. They are less likely to repeat a behavior that results in the loss of a Good Thing. This type of consequence is a little harder to control. Examples 27. The child has his crayons taken away for fighting with his sister. The window looking into the other monkey's enclosure is shut when the first monkey bites the trainer. "This car isn't getting any closer to Disneyland while you kids are fighting!" The dog is put on leash and taken from the park for coming to the owner when the owner called (this causes the unintentional result of the dog being less likely to respond to the recall). The teenager is grounded for misbehavior. The dolphin trainer walks away with the fish bucket when the dolphin acts aggressive. "I'm not talking to you after what you did!" Xena The Warrior Princess cuts off the air of an opponent who refuses to tell her what she wants. Secondary Negative Punishers 28. Trainers seldom go to the trouble of associating a particular cue with negative punishment. It's sometimes called a "delta", from SD or discriminative stimulus. Some

dog owners make the mistake of calling their dogs in the park and then using the negative punishment of taking the dog away from the fun. "Fido, come!" then becomes a conditioned negative punisher. My mom conditioned a similar CP- as "Time to go!". 29. Positive Punishment 30. Positive punishment is something that is applied to reduce a behavior. The term "positive" often confuses people, because in common terms "positive" means something good, upbeat, happy, pleasant, rewarding. Remember, this is technical terminology we're using, though, so here "positive" means "added" or "started". Also keep in mind that in these terms, it is not the animal that is "punished" (treated badly to pay for some moral wrong), but the behavior that is "punished" (in other words, reduced). Positive punishment, when applied correctly, is the most effective way to stop unwanted behaviors. Its main flaw is that it does not teach specific alternative behaviors. Examples 31. Our society seems to have a great fondness for positive punishment, in spite of all the problems associated with it (see below). The peeing on the rug (by a puppy) is punished with a swat of the newspaper. A dog's barking is punished with a startling squirt of citronella. The driver's speeding results in a ticket and a fine. The baby's hand is burned when she touches the hot stove. Walking straight through low doorways is punished with a bonk on the head. In all of these cases, the consequence (the positive punishment) reduces the behavior's future occurrences. Secondary Positive Punishers 32. Because a positive punisher, like other consequences, must follow a behavior immediately or be clearly connected to the behavior to be effective, a secondary positive punisher is very important. (This is especially true if the punisher is going to be something highly aversive or painful). Many dog trainers actively condition the word "No!" with some punisher, to form an association between the word and the consequence. The conditioned punisher (CP+) is an important part of training with Operant Conditioning.

Cautions in using Positive Punishment 33. Behaviors are usually motivated by the expectation for some reward, and even with a punishment, the motivation of the reward is often still there. For example, a predator must face some considerable risk and pain in order to catch food. A wild dog must run over rough ground and through bushes, and face the hooves, claws, teeth, and/or horns of their prey animals. They might be painfully injured in their pursuit. In spite of this, they continue to pursue prey. In this case, the motivation and the reward far outweigh the punishments, even when they are dramatic. 34. The timing of a positive punishment must be exquisite. It must correspond exactly with the behavior for it to have an effect. (If a conditioned punisher is used, the CP+ must

occur precisely with the behavior). If you catch your dog chewing on the furniture and you hit him when he comes to you, you are suppressing coming to you. The dog will not make the connection between the punishment and the chewing (no matter how much you point at the furniture). 35. The aversive must be sufficient to stop the behavior in its tracks - and must be greater than the reward. The more experience the animal has with a rewarding consequence for the behavior, the greater the aversive has to be to stop or decrease the behavior. If you start with a small aversive (mild electric shock or a stern talking-to) and build up to a greater one (strong shock or full-on yelling), your trainee may become adjusted to the aversive and it will not have any greater effect.

36. Punishments may become associated with the person supplying them. The dog who was hit after chewing on the furniture may still chew on the furniture, but he certainly won't do it when you're around! 37. Physical punishments can cause physical damage, and mental punishments can cause mental damage. You should only apply as much of an aversive as it takes to stop the behavior. If you find you have to apply a punishment more than three times for one behavior, without any decrease in the behavior, you are not "reducing the behavior", you are harassing (or abusing) the trainee. 38. Punishers suppress behaviors. Use positive punishment to train an animal not to do something.

Internal Reinforcers and Punishers


Trainers can not control all reinforcers and punishers, unfortunately. There are a number of environmental factors that are going to affect the animal's behavior that you have no control over, but which will still be a significant consequence for your trainee. Some of these come from the animal's internal environment - their own reactions. Relief from stress, pain, or boredom are common reinforcers and some "self-reinforcing" behaviors are actually maintained because of this. Examples are a dog barking because it relieves boredom, or a person chewing on her fingers or smoking a cigarette because it relieves stress. Drivers speed because it is fun. Guilt is an internal punisher that some people experience.

Also:
y

y y

Avoidance learning is a type of learning in which a certain behavior results in the cessation of an aversive stimulus. For example, performing the behavior of shielding one's eyes when in the sunlight (or going indoors) will help avoid the aversive stimulation of having light in one's eyes. Extinction occurs when a behavior (response) that had previously been reinforced is no longer effective. In the Skinner box experiment, this is the rat pushing the lever and being rewarded with a food pellet several times, and then pushing the lever again and never receiving a food pellet again. Eventually the rat would cease pushing the lever. Noncontingent reinforcement refers to delivery of reinforcing stimuli regardless of the organism's (aberrant) behavior. The idea is that the target behavior decreases because it is no longer necessary to receive the reinforcement. This typically entails time-based delivery of stimuli identified as maintaining aberrant behavior, which serves to decrease the rate of the target behavior.[2] As no measured behavior is identified as being strengthened, there is controversy surrounding the use of the term noncontingent "reinforcement".[3] Shaping is a form of operant conditioning in which the increasingly accurate approximations of a desired response are reinforced.[4] Chaining is an instructional procedure which involves reinforcing individual responses occurring in a sequence to form a complex behavior.[4]

Thorndike's law
Operant conditioning, sometimes called instrumental conditioning or instrumental learning, was first extensively studied by Edward L. Thorndike (18741949), who observed the behavior of cats trying to escape from home-made puzzle boxes.[5] When first constrained in the boxes, the cats took a long time to escape. With experience, ineffective responses occurred less frequently and successful responses occurred more frequently, enabling the cats to escape in less time over successive trials. In his law of effect, Thorndike theorized that successful responses, those producing satisfying consequences, were "stamped in" by the experience and thus occurred more frequently. Unsuccessful responses, those producing annoying consequences, were stamped out and subsequently occurred less frequently. In short, some consequences strengthened behavior and some consequences weakened behavior. Thorndike produced the first known learning curves through this procedure. B.F. Skinner (19041990) formulated a more detailed analysis of operant conditioning based on reinforcement, punishment, and extinction. Following the ideas of Ernst Mach, Skinner rejected Thorndike's mediating structures required by "satisfaction" and constructed a new conceptualization of behavior without any such references. So, while experimenting with some homemade feeding mechanisms, Skinner invented the operant conditioning chamber which allowed him to measure rate of response as a key dependent variable using a cumulative record of lever presses or key pecks.[6]

Biological correlates of operant conditioning

The first scientific studies identifying neurons that responded in ways that suggested they encode for conditioned stimuli came from work by Mahlon deLong[7][8] and by R.T. "Rusty" Richardson.[8] They showed that nucleus basalis neurons, which release acetylcholine broadly throughout the cerebral cortex, are activated shortly after a conditioned stimulus, or after a primary reward if no conditioned stimulus exists. These neurons are equally active for positive and negative reinforcers, and have been demonstrated to cause plasticity in many cortical regions.[9] Evidence also exists that dopamine is activated at similar times. There is considerable evidence that dopamine participates in both reinforcement and aversive learning.[10] Dopamine pathways project much more densely onto frontal cortex regions. Cholinergic projections, in contrast, are dense even in the posterior cortical regions like the primary visual cortex. A study of patients with Parkinson's disease, a condition attributed to the insufficient action of dopamine, further illustrates the role of dopamine in positive reinforcement.[11] It showed that while off their medication, patients learned more readily with aversive consequences than with positive reinforcement. Patients who were on their medication showed the opposite to be the case, positive reinforcement proving to be the more effective form of learning when the action of dopamine is high.

Factors that alter the effectiveness of consequences


When using consequences to modify a response, the effectiveness of a consequence can be increased or decreased by various factors. These factors can apply to either reinforcing or punishing consequences. 1. Satiation/Deprivation: The effectiveness of a consequence will be reduced if the individual's "appetite" for that source of stimulation has been satisfied. Inversely, the effectiveness of a consequence will increase as the individual becomes deprived of that stimulus. If someone is not hungry, food will not be an effective reinforcer for behavior. Satiation is generally only a potential problem with primary reinforcers, those that do not need to be learned such as food and water. 2. Immediacy: After a response, how immediately a consequence is then felt determines the effectiveness of the consequence. More immediate feedback will be more effective than less immediate feedback. If someone's license plate is caught by a traffic camera for speeding and they receive a speeding ticket in the mail a week later, this consequence will not be very effective against speeding. But if someone is speeding and is caught in the act by an officer who pulls them over, then their speeding behavior is more likely to be affected.[citation needed] 3. Contingency: If a consequence does not contingently (reliably, or consistently) follow the target response, its effectiveness upon the response is reduced. But if a consequence follows the response consistently after successive instances, its ability to modify the response is increased. The schedule of reinforcement, when consistent, leads to faster learning. When the schedule is variable the learning is slower. Extinction is more difficult when learning occurs during intermittent reinforcement and more easily extinguished when learning occurs during a highly consistent schedule. 4. Size: This is a "cost-benefit" determinant of whether a consequence will be effective. If the size, or amount, of the consequence is large enough to be worth the effort, the consequence will be more effective upon the behavior. An unusually large lottery

jackpot, for example, might be enough to get someone to buy a one-dollar lottery ticket (or even buying multiple tickets). But if a lottery jackpot is small, the same person might not feel it to be worth the effort of driving out and finding a place to buy a ticket. In this example, it's also useful to note that "effort" is a punishing consequence. How these opposing expected consequences (reinforcing and punishing) balance out will determine whether the behavior is performed or not. Most of these factors exist for biological reasons. The biological purpose of the Principle of Satiation is to maintain the organism's homeostasis. When an organism has been deprived of sugar, for example, the effectiveness of the taste of sugar as a reinforcer is high. However, as the organism reaches or exceeds their optimum blood-sugar levels, the taste of sugar becomes less effective, perhaps even aversive. The Principles of Immediacy and Contingency exist for neurochemical reasons. When an organism experiences a reinforcing stimulus, dopamine pathways in the brain are activated. This network of pathways "releases a short pulse of dopamine onto many dendrites, thus broadcasting a rather global reinforcement signal to postsynaptic neurons."[12] This results in the plasticity of these synapses allowing recently activated synapses to increase their sensitivity to efferent signals, hence increasing the probability of occurrence for the recent responses preceding the reinforcement. These responses are, statistically, the most likely to have been the behavior responsible for successfully achieving reinforcement. But when the application of reinforcement is either less immediate or less contingent (less consistent), the ability of dopamine to act upon the appropriate synapses is reduced.

Operant variability
Operant variability is what allows a response to adapt to new situations. Operant behavior is distinguished from reflexes in that its response topography (the form of the response) is subject to slight variations from one performance to another. These slight variations can include small differences in the specific motions involved, differences in the amount of force applied, and small changes in the timing of the response. If a subject's history of reinforcement is consistent, such variations will remain stable because the same successful variations are more likely to be reinforced than less successful variations. However, behavioral variability can also be altered when subjected to certain controlling variables.[13]

Avoidance learning
Avoidance learning belongs to negative reinforcement schedules. The subject learns that a certain response will result in the termination or prevention of an aversive stimulus. There are two kinds of commonly used experimental settings: discriminated and free-operant avoidance learning.

Discriminated avoidance learning

In discriminated avoidance learning, a novel stimulus such as a light or a tone is followed by an aversive stimulus such as a shock (CS-US, similar to classical conditioning). During the first trials (called escape-trials) the animal usually experiences both the CS (Conditioned Stimulus) and the US (Unconditioned Stimulus), showing the operant response to terminate the aversive US. During later trials, the animal will learn to perform the response already during the presentation of the CS thus preventing the aversive US from occurring. Such trials are called "avoidance trials."

Free-operant avoidance learning


In this experimental session, no discrete stimulus is used to signal the occurrence of the aversive stimulus. Rather, the aversive stimulus (mostly shocks) are presented without explicit warning stimuli. There are two crucial time intervals determining the rate of avoidance learning. This first one is called the S-S-interval (shock-shock-interval). This is the amount of time which passes during successive presentations of the shock (unless the operant response is performed). The other one is called the R-S-interval (response-shock-interval) which specifies the length of the time interval following an operant response during which no shocks will be delivered. Note that each time the organism performs the operant response, the R-S-interval without shocks begins anew.

Two-process theory of avoidance


This theory was originally established to explain learning in discriminated avoidance learning. It assumes two processes to take place: a) Classical conditioning of fear. During the first trials of the training, the organism experiences both CS and aversive US (escape-trials). The theory assumed that during those trials classical conditioning takes place by pairing the CS with the US. Because of the aversive nature of the US the CS is supposed to elicit a conditioned emotional reaction (CER) fear. In classical conditioning, presenting a CS conditioned with an aversive US disrupts the organism's ongoing behavior. b) Reinforcement of the operant response by fear-reduction. Because during the first process, the CS signaling the aversive US has itself become aversive by eliciting fear in the organism, reducing this unpleasant emotional reaction serves to motivate the operant response. The organism learns to make the response during the US, thus terminating the aversive internal reaction elicited by the CS. An important aspect of this theory is that the term "avoidance" does not really describe what the organism is doing. It does not "avoid" the aversive US in the sense of anticipating it. Rather the organism escapes an aversive internal state, caused by the CS.

Verbal Behavior
In 1957, Skinner published Verbal Behavior, a theoretical extension of the work he had pioneered since 1938. This work extended the theory of operant conditioning to human behavior

previously assigned to the areas of language, linguistics and other areas. Verbal Behavior is the logical extension of Skinner's ideas, in which he introduced new functional relationship categories such as intraverbals, autoclitics, mands, tacts and the controlling relationship of the audience. All of these relationships were based on operant conditioning and relied on no new mechanisms despite the introduction of new functional categories.

Four term contingency


Applied behavior analysis, which is the name of the discipline directly descended from Skinner's work, holds that behavior is explained in four terms: conditional stimulus (SC), a discriminative stimulus (Sd), a response (R), and a reinforcing stimulus (Srein or Sr for reinforcers, sometimes Save for aversive stimuli).[14]

Operant hoarding
Operant hoarding is a referring to the choice made by a rat, on a compound schedule called a multiple schedule, that maximizes its rate of reinforcement in an operant conditioning context. More specifically, rats were shown to have allowed food pellets to accumulate in a food tray by continuing to press a lever on a continuous reinforcement schedule instead of retrieving those pellets. Retrieval of the pellets always instituted a one-minute period of extinction during which no additional food pellets were available but those that had been accumulated earlier could be consumed. This finding appears to contradict the usual finding that rats behave impulsively in situations in which there is a choice between a smaller food object right away and a larger food object after some delay. See schedules of reinforcement.[15]

An alternative to the law of effect


However, an alternative perspective has been proposed by R. Allen and Beatrix Gardner.[16][17] Under this idea, which they called "feedforward," animals learn during operant conditioning by simple pairing of stimuli, rather than by the consequences of their actions. Skinner asserted that a rat or pigeon would only manipulate a lever if rewarded for the action, a process he called "shaping" (reward for approaching then manipulating a lever).[18] However, in order to prove the necessity of reward (reinforcement) in lever pressing, a control condition where food is delivered without regard to behavior must also be conducted. Skinner never published this control group. Only much later was it found that rats and pigeons do indeed learn to manipulate a lever when food comes irrespective of behavior. This phenomenon is known as autoshaping.[19] Autoshaping demonstrates that consequence of action is not necessary in an operant conditioning chamber, and it contradicts the law of effect. Further experimentation has shown that rats naturally handle small objects, such as a lever, when food is present.[20] Rats seem to insist on handling the lever when free food is available (contra-freeloading)[21][22] and even when pressing the lever leads to less food (omission training).[23][24] Whenever food is presented, rats handle the lever, regardless if lever pressing leads to more food. Therefore, handling a lever is a natural behavior that rats do as preparatory feeding activity, and in turn, lever pressing cannot logically be used as evidence for reward or reinforcement to occur. In the absence of evidence for reinforcement during operant conditioning, learning which occurs during operant experiments is actually only

Pavlovian (classical) conditioning. The dichotomy between Pavlovian and operant conditioning is therefore an inappropriate separation. Theory B. F. Skinners entire system is based on operant conditioning. The organism is in the process of operating on the environment, which in ordinary terms means it is bouncing around its world, doing what it does. During this operating, the organism encounters a special kind of stimulus, called a reinforcing stimulus, or simply a reinforcer. This special stimulus has the effect of increasing the operant -- that is, the behavior occurring just before the reinforcer. This is operant conditioning: the behavior is followed by a consequence, and the nature of the consequence modifies the organisms tendency to repeat the behavior in the future. Imagine a rat in a cage. This is a special cage (called, in fact, a Skinner box) that has a bar or pedal on one wall that, when pressed, causes a little mechanism to release a food pellet into the cage. The rat is bouncing around the cage, doing whatever it is rats do, when he accidentally presses the bar and -- hey, presto! -- a food pellet falls into the cage! The operant is the behavior just prior to the reinforcer, which is the food pellet, of course. In no time at all, the rat is furiously peddling away at the bar, hoarding his pile of pellets in the corner of the cage. A behavior followed by a reinforcing stimulus results in an increased probability of that behavior occurring in the future. What if you dont give the rat any more pellets? Apparently, hes no fool, and after a few futile attempts, he stops his bar-pressing behavior. This is called extinction of the operant behavior. A behavior no longer followed by the reinforcing stimulus results in a decreased probability of that behavior occurring in the future. Now, if you were to turn the pellet machine back on, so that pressing the bar again provides the rat with pellets, the behavior of bar-pushing will pop right back into existence, much more quickly than it took for the rat to learn the behavior the first time. This is because the return of the reinforcer takes place in the context of a reinforcement history that goes all the way back to the very first time the rat was reinforced for pushing on the bar! Schedules of reinforcement Skinner likes to tell about how he accidentally -- i.e. operantly -- came across his various discoveries. For example, he talks about running low on food pellets in the middle of a study. Now, these were the days before Purina rat chow and the like, so Skinner had to make his own rat pellets, a slow and tedious task. So he decided to reduce the number of reinforcements he gave his rats for whatever behavior he was trying to condition, and, lo and behold, the rats kept up their operant behaviors, and at a stable rate, no less. This is how Skinner discovered schedules of reinforcement!

Continuous reinforcement is the original scenario: Every time that the rat does the behavior (such as pedal-pushing), he gets a rat goodie. The fixed ratio schedule was the first one Skinner discovered: If the rat presses the pedal three times, say, he gets a goodie. Or five times. Or twenty times. Or x times. There is a fixed ratio between behaviors and reinforcers: 3 to 1, 5 to 1, 20 to 1, etc. This is a little like piece rate in the clothing manufacturing industry: You get paid so much for so many shirts. The fixed interval schedule uses a timing device of some sort. If the rat presses the bar at least once during a particular stretch of time (say 20 seconds), then he gets a goodie. If he fails to do so, he doesnt get a goodie. But even if he hits that bar a hundred times during that 20 seconds, he still only gets one goodie! One strange thing that happens is that the rats tend to pace themselves: They slow down the rate of their behavior right after the reinforcer, and speed up when the time for it gets close. Skinner also looked at variable schedules. Variable ratio means you change the x each time - first it takes 3 presses to get a goodie, then 10, then 1, then 7 and so on. Variable interval means you keep changing the time period -- first 20 seconds, then 5, then 35, then 10 and so on. In both cases, it keeps the rats on their rat toes. With the variable interval schedule, they no longer pace themselves, because they can no longer establish a rhythm between behavior and reward. Most importantly, these schedules are very resistant to extinction. It makes sense, if you think about it. If you havent gotten a reinforcer for a while, well, it could just be that you are at a particularly bad ratio or interval! Just one more bar press, maybe thisll be the one! This, according to Skinner, is the mechanism of gambling. You may not win very often, but you never know whether and when youll win again. It could be the very next time, and if you dont roll them dice, or play that hand, or bet on that number this once, youll miss on the score of the century! Shaping A question Skinner had to deal with was how we get to more complex sorts of behaviors. He responded with the idea of shaping, or the method of successive approximations. Basically, it involves first reinforcing a behavior only vaguely similar to the one desired. Once that is established, you look out for variations that come a little closer to what you want, and so on, until you have the animal performing a behavior that would never show up in ordinary life. Skinner and his students have been quite successful in teaching simple animals to do some quite extraordinary things. My favorite is teaching pigeons to bowl! I used shaping on one of my daughters once. She was about three or four years old, and was afraid to go down a particular slide. So I picked her up, put her at the end of the slide, asked if she was okay and if she could jump down. She did, of course, and I showered her with praise. I then picked her up and put her a foot or so up the slide, asked her if she was okay, and asked her to slide down and jump off. So far so good. I repeated this again and again, each time moving her a little up the slide, and backing off if she got nervous. Eventually, I could put her at the top

of the slide and she could slide all the way down and jump off. Unfortunately, she still couldnt climb up the ladder, so I was a very busy father for a while. This is the same method that is used in the therapy called systematic desensitization, invented by another behaviorist named Joseph Wolpe. A person with a phobia -- say of spiders -- would be asked to come up with ten scenarios involving spiders and panic of one degree or another. The first scenario would be a very mild one -- say seeing a small spider at a great distance outdoors. The second would be a little more scary, and so on, until the tenth scenario would involve something totally terrifying -- say a tarantula climbing on your face while youre driving your car at a hundred miles an hour! The therapist will then teach you how to relax your muscles -- which is incompatible with anxiety. After you practice that for a few days, you come back and you and the therapist go through your scenarios, one step at a time, making sure you stay relaxed, backing off if necessary, until you can finally imagine the tarantula while remaining perfectly tension-free. This is a technique quite near and dear to me because I did in fact have a spider phobia, and did in fact get rid of it with systematic desensitization. It worked so well that, after one session (beyond the original scenario-writing and muscle-training session) I could go out an pick up a daddy-long-legs. Cool. Beyond these fairly simple examples, shaping also accounts for the most complex of behaviors. You dont, for example, become a brain surgeon by stumbling into an operating theater, cutting open someone's head, successfully removing a tumor, and being rewarded with prestige and a hefty paycheck, along the lines of the rat in the Skinner box. Instead, you are gently shaped by your environment to enjoy certain things, do well in school, take a certain bio class, see a doctor movie perhaps, have a good hospital visit, enter med school, be encouraged to drift towards brain surgery as a speciality, and so on. This could be something your parents were carefully doing to you, as if you were a rat in a cage. But much more likely, this is something that was more or less unintentional. Aversive stimuli An aversive stimulus is the opposite of a reinforcing stimulus, something we might find unpleasant or painful. A behavior followed by an aversive stimulus results in a decreased probability of the behavior occurring in the future. This both defines an aversive stimulus and describes the form of conditioning known as punishment. If you shock a rat for doing x, itll do a lot less of x. If you spank Johnny for throwing his toys he will throw his toys less and less (maybe). On the other hand, if you remove an already active aversive stimulus after a rat or Johnny performs a certain behavior, you are doing negative reinforcement. If you turn off the electricity when the rat stands on his hind legs, hell do a lot more standing. If you stop your perpetually nagging when I finally take out the garbage, Ill be more likely to take out the

garbage (perhaps). You could say it feels so good when the aversive stimulus stops, that this serves as a reinforcer! Behavior followed by the removal of an aversive stimulus results in an increased probability of that behavior occurring in the future. Notice how difficult it can be to distinguish some forms of negative reinforcement from positive reinforcement: If I starve you, is the food I give you when you do what I want a positive -- i.e. a reinforcer? Or is it the removal of a negative -- i.e. the aversive stimulus of hunger? Skinner (contrary to some stereotypes that have arisen about behaviorists) doesnt approve of the use of aversive stimuli -- not because of ethics, but because they dont work well! Notice that I said earlier that Johnny will maybe stop throwing his toys, and that I perhaps will take out the garbage? Thats because whatever was reinforcing the bad behaviors hasnt been removed, as it wouldve been in the case of extinction. This hidden reinforcer has just been covered up with a conflicting aversive stimulus. So, sure, sometimes the child (or me) will behave -- but it still feels good to throw those toys. All Johnny needs to do is wait till youre out of the room, or find a way to blame it on his brother, or in some way escape the consequences, and hes back to his old ways. In fact, because Johnny now only gets to enjoy his reinforcer occasionally, hes gone into a variable schedule of reinforcement, and hell be even more resistant to extinction than ever! Behavior modification Behavior modification -- often referred to as b-mod -- is the therapy technique based on Skinners work. It is very straight-forward: Extinguish an undesirable behavior (by removing the reinforcer) and replace it with a desirable behavior by reinforcement. It has been used on all sorts of psychological problems -- addictions, neuroses, shyness, autism, even schizophrenia -and works particularly well with children. There are examples of back-ward psychotics who havent communicated with others for years who have been conditioned to behave themselves in fairly normal ways, such as eating with a knife and fork, taking care of their own hygiene needs, dressing themselves, and so on. There is an offshoot of b-mod called the token economy. This is used primarily in institutions such as psychiatric hospitals, juvenile halls, and prisons. Certain rules are made explicit in the institution, and behaving yourself appropriately is rewarded with tokens -- poker chips, tickets, funny money, recorded notes, etc. Certain poor behavior is also often followed by a withdrawal of these tokens. The tokens can be traded in for desirable things such as candy, cigarettes, games, movies, time out of the institution, and so on. This has been found to be very effective in maintaining order in these often difficult institutions. There is a drawback to token economy: When an inmate of one of these institutions leaves, they return to an environment that reinforces the kinds of behaviors that got them into the institution in the first place. The psychotics family may be thoroughly dysfunctional. The juvenile offender may go right back to the hood. No one is giving them tokens for eating

politely. The only reinforcements may be attention for acting out, or some gang glory for robbing a Seven-Eleven. In other words, the environment doesnt travel well! Walden II Skinner started his career as an English major, writing poems and short stories. He has, of course, written a large number of papers and books on behaviorism. But he will probably be most remembered by the general run of readers for his book Walden II, wherein he describes a utopia-like commune run on his operant principles. People, especially the religious right, came down hard on his book. They said that his ideas take away our freedom and dignity as human beings. He responded to the sea of criticism with another book (one of his best) called Beyond Freedom and Dignity. He asked: What do we mean when we say we want to be free? Usually we mean we dont want to be in a society that punishes us for doing what we want to do. Okay -- aversive stimuli dont work well anyway, so out with them! Instead, well only use reinforcers to control society. And if we pick the right reinforcers, we will feel free, because we will be doing what we feel we want! Likewise for dignity. When we say she died with dignity, what do we mean? We mean she kept up her good behaviors without any apparent ulterior motives. In fact, she kept her dignity because her reinforcement history has led her to see behaving in that "dignified" manner as more reinforcing than making a scene. The bad do bad because the bad is rewarded. The good do good because the good is rewarded. There is no true freedom or dignity. Right now, our reinforcers for good and bad behavior are chaotic and out of our control -- its a matter of having good or bad luck with your choice of parents, teachers, peers, and other influences. Lets instead take control, as a society, and design our culture in such a way that good gets rewarded and bad gets extinguished! With the right behavioral technology, we can design culture. Both freedom and dignity are examples of what Skinner calls mentalistic constructs -unobservable and so useless for a scientific psychology. Other examples include defense mechanisms, the unconscious, archetypes, fictional finalisms, coping strategies, selfactualization, consciousness, even things like hunger and thirst. The most important example is what he refers to as the homunculus -- Latin for the little man -- that supposedly resides inside us and is used to explain our behavior, ideas like soul, mind, ego, will, self, and, of course, personality. Instead, Skinner recommends that psychologists concentrate on observables, that is, the environment and our behavior in it. Principles: 1. Behavior that is positively reinforced will reoccur; intermittent reinforcement is particularly effective

2. Information should be presented in small amounts so that responses can be reinforced ("shaping") 3. Reinforcements will generalize across similar stimuli ("stimulus generalization") producing secondary conditioning

"No Reward Markers" and "Keep Going Signals"


There's actually a fifth possible consequence to any behavior: nothing. You push the button and nothing happens. You raise your hand and the teacher doesn't call on you. You get no response to your e-mail, your proposal, or your job application. The question you then have is, did no one notice your behavior? Or was it just not worthy of a reinforcement? To differentiate between these two possibilities, a trainer can use a no reward marker (NRM). The NRM tells the animal that its behavior will not gain it a reinforcer. A lot of dog trainers use "Nope!" "Wrong!" "Uh-uh!" or "Try again" as NRMs. For example, if you're teaching your dog to sit in response to the cue "sit" (it's not as obvious to the dog as it is to you; after all, dogs don't have the experience of verbal words being labels for actions), and the dog lies down or barks, you can give a NRM. The purpose of the NRM is to get the animal to try something different. It is not a conditioned punisher and should not be used when the dog does something you don't want it to ever do. It's for when a behavior might be correct in a different circumstance but not in this one. Some trainers also have developed a keep going signal (KGS). This signal tells the animal that it's on the right track, that its behavior is leading to something that will gain it a reinforcer. For example, if you're teaching a dog to roll over and it will lie on its side, you can use a KGS to tell it that it's close to a behavior that will get it a reward, but not there yet. Read more on the KGS here.

You might also like